Skip to content

See UK Released

See UK (http://apps.seme4.com/see-uk) is a simple visualisation of data that has geographic aspects and has been published as machine-interpretable Linked Data.

This site uses data that has been sourced from data.gov.uk and processed into Linked Data where necessary, but is also designed to be able to use other sources where available. All the datasets are then enriched, by calculating area totals from point data and inferring aggregate values for regions that do not have explicit data values, and further enriched by establishing linkage between the datasets. These enriched datasets are available directly from the EnAKTing Project at http://crime.rkbexplorer.com/, http://transport.psi.enakting.org/ and http://absenteeism.psi.enakting.org/. Some of it can be accessed using the Linked Data API at http://puelia.psi.enakting.org.

It runs on servers provided by the EnAKTing Project and Electronics and Computer Science at the University of Southampton.

The visualisation provides a view centred on a chosen region of the specified size, and most noticeably gives a “pie-chart” that shows the viewer how that region compares with similar regions around it. It is thus designed to focus on the information most relevant to the user. Colour indicates the “worst” (red) and “best” (green) areas from those shown. This pie-chart is shown in preference to simply colouring the map itself, as a coloured map confuses the map features with the data being visualised.

It also gives some context of the real geography involved, so that a full picture is seen. The user can navigate by looking and clicking on the pie-chart, or the map, and can thus move around using whatever view they are taking of the data presentation. A search by postcode functionality is also supported, aiding the user in finding specific locations.

An important aspect of the visualisation is that cross-dataset correlation can be achieved and presented in a natural fashion, as the data can be viewed as normalised by population or area, in addition to the raw values. The user can therefore see how regions compare in terms of, for example, crime density by population or area, rather than just knowing that their county has little crime, and guessing this is because the county has a small population or area.

See UK has been produced as a collaborative activity between Seme4 Ltd and members of the EnAKTing project at the University of Southampton.

For further details please contact Hugh Glaser or Ian Millard; feedback on this application is very welcome.

Geonames

In order to get a list of countries and their alternative names from the Geonames RDF Dump, we can do:

SELECT DISTINCT ?country ?name ?altname
WHERE
{?foo <http://www.geonames.org/ontology#parentCountry> ?country .
?country <http://www.geonames.org/ontology#name> ?name .
?country <http://www.geonames.org/ontology#alternateName> ?altname .
FILTER(lang(?altname) = "en")
}

Which leads to 344 results, which look much like the following (in JSON format), as we expect:

      {
        "country": { "type": "uri" , "value": "http://sws.geonames.org/2635167/" } ,
        "name": { "type": "literal" , "value": "United Kingdom of Great Britain and Northern Ireland" } ,
        "altname": { "type": "literal" , "xml:lang": "en" , "value": "Britain" }
      } ,
      {
        "country": { "type": "uri" , "value": "http://sws.geonames.org/2635167/" } ,
        "name": { "type": "literal" , "value": "United Kingdom of Great Britain and Northern Ireland" } ,
        "altname": { "type": "literal" , "xml:lang": "en" , "value": "UK" }
      } ,
      {
        "country": { "type": "uri" , "value": "http://sws.geonames.org/2635167/" } ,
        "name": { "type": "literal" , "value": "United Kingdom of Great Britain and Northern Ireland" } ,
        "altname": { "type": "literal" , "xml:lang": "en" , "value": "U.K." }
      } ,

Fast SPARQL XML Results Parser in Python

For one of our projects we need results from SPARQL endpoints as quickly as possible, with little to no need for validation.

As such, I re-wrote our original SPARQL XML results parser to use Expat, the non-validating (and fast) XML parser.

The results format is a dict in roughly the same as the bindings part of the SPARQL JSON results format.

Example of use:

sp = SparqlParser()
results = sp.Parse(xmlstring)

Code:

import xml.parsers.expat

# Fast Expat based SPARQL stream parser Copyright (c) 2011 Daniel Alexander Smith, University of Southampton
class SparqlParser:

    def __init__(self):
        self.results = []
        self.current = {}
        self.current_name = ""
        self.current_chars = ""
        self.current_type = ""
        self.getting_chars = False
        self.parser = xml.parsers.expat.ParserCreate()
        self.parser.StartElementHandler = self.start_element
        self.parser.EndElementHandler = self.end_element
        self.parser.CharacterDataHandler = self.char_data

    def start_element(self, name, attrs):
        if name == 'binding':
            self.current_name = attrs['name']
        if name == 'literal':
            self.current_type = 'literal'
            self.getting_chars = True
        if name == 'bnode':
            self.current_type = 'bnode'
            self.getting_chars = True
        if name == 'uri':
            self.current_type = 'uri'
            self.getting_chars = True

    def end_element(self, name):
        if name == 'binding':
            self.current[self.current_name] = {'value': self.current_chars, 'type': self.current_type}
            self.current_chars = ""
        if name == 'literal':
            self.getting_chars = False
        if name == 'bnode':
            self.getting_chars = False
        if name == 'uri':
            self.getting_chars = False
        if name == 'result':
            self.results.append(self.current)
            self.current = {}

    def char_data(self, data):
        if self.getting_chars:
            self.current_chars = self.current_chars + data

    def Parse(self, data):
        self.parser.Parse(data, 0)
        return self.results

Data Journalism and its role for open government data

Simon Rogers, editor of The Guardian‘s Datablog, last week posted a top 10 list data.gov.uk datasets by how they could be relevant to people, highlighting a number of very interesting data sets.  He featured national transport statistics, a massive data set cataloging not only every bus, rail, coach stop or pier in the UK but every bus, train, tram, or ferry that docked for a week  in October; as well as statistics on government spending (COINS),  the UK labour market / employment statistics by year, youth perspectives and attitudes by region, and statistics on dog messes by UK region.  For each he describes a little synopsis of the contents of the data set, highlights its potential uses, and describes problems/limitations of the data.

This post is of tremendous use not because it merely serves to highlight a tiny, delicious morsel from a rather immense soup of more than 4,223 datasets on data.gov.uk, but because helps make it relevant to people: he describes (in easily human-understandable terms) what is in each data set, why the data is relevant or interesting, and most importantly, ways that it can be put to use.

The chasm between publishing and use is currently large and daunting.  Many of the data sets are in “raw” form, Excel spreadsheets created by public servants (using highly specialized government vocabulary) or immense, multi-gigabyte CSV files with little supporting documentation.  What we see now is a gold rush (on both sides of the Atlantic – data.gov and data.gov.uk) of citizen-hackers who are downloading this data, writing scripts to parse through it, and generating visualisations and apps that make it possible for end-users to actually use it in various ways.

But the Guardian Datablog highlights that this might be an ideal role for journalism to come in as well – while citizen-hackers have been effective at rolling mash-ups that let everyday people get at the data, it still takes a journalist/reporter to get tasty bits out of it — to transform the raw bits into information – speculation, perspective, and to contextualize it in world/current events, and to weave it into a story that leads people to question what the data say about the ways they live each day.

These data journalists, of course, do not have to come from Big Media (TV, newspapers) as such – the ones that do just happen to be best equipped with the right set of skills.  In the future, it would be interesting to see whether the many, emerging sense-making and visualisation tools, such as ManyEyes, Google Fusion Tables , Freebase Gridworks, and our own work, enAKTing’s GEORDI browser (forthcoming), could make data-journalism more accessible to citizens without a background in statistical data analysis or a journalism degree.  If so, these tools could unleash  masses of newly equipped citizen-journalists on the terabytes of open data now publicly available, so that it can be more immediately transformed into information that can start to make an difference in people’s lives.

A linked data web of a million easy pieces

(The following article is an op-ed piece and does not reflect the views of EnAKTing as a whole.)

The community of developers and researchers working on the Web of Linked Data are an extraordinary group of talented hackers.  But anybody who is a member of the community quickly runs into the same problems: the number of tools we have at our disposal are minuscule compared to the massive quantities of “Web 2.0″ core software tools and development frameworks.  As a result, we often resort to building things from scratch – over and over again.

This becomes painfully clear when attempting to teach a new generation of software architects (e.g., university undergraduates) how to build Linked Data systems.  ”Is < tool > (e.g., rdflib) really the only triple store for Python?” Well, no, I reply, there are a half a dozen others but they are mostly long abandoned, woefully incomplete, unstable, buggy, or over-engineered and too complicated to use — or maybe there are others, but they are insufficiently advertised and unknown.  How many server frameworks are there for Web 2.0 sites, by comparison?  Nearly 300, and growing.  How do you find them?  Under article entitled Web frameworks on Wikipedia.  What is their adoption rate?  Massive.  How good are they?  The best power web sites like Twitter, WordPress, and so on: good software.

It is thus no wonder that hundreds of  Web 2.0 developers are born every second, while the number of new Linked Data systems each year grows by the dozens.  But not for long: we are on the brink of a phase change, one that involves as much an adoption of Web 2.0 culture by the Linked Data community as the Web 2.0 community has to learn about rich data representations.

The Linked Data community is picking up one important lesson from Web 2.0: simplicity.  The singular feature of the success of a Web 2.0 framework or toolkit is how easy it is to understand and use.   Simplicity begets understandability – a tool designed to do only one thing is easy to understand.  The second is robustness — nobody wants to rely on a system that is buggy or incomplete.   These two priorities have pushed the best tools of Web 2.0 (e.g., server frameworks such as Django or client-side APIs such as jQuery) to become the most widely disseminated and re-used code on the planet.  Out of these reusable bricks has grown the thousands of random Web-2.0 style social networking and sharing sites we have today.

The tradition the Linked Data community is starting to leave behind, meanwhile is that of building massive, opaque, integrated systems.  Without wanting to name any here explicitly, one can fairly easily point to massive Linked Data systems that never gained adoption by a single real user because they singularly tried to do too much at once. The lack of ready-made robust tools means that most of these systems started by re-inventing the platform : re-implementing a triple stores, RDF parsers and APIs, DL reasoners, prior to implementing the application or desired user interface.  In Web 2.0, the equivalent would be roughly designing a web site by re-writing one’s own HTTP server, web framework or templating language from the ground up – an obviously time-consuming exercise requiring substantial software development experience.

As a community to move towards the model of building Linked Data applications out of easy pieces as Web 2.0 does, we need to encourage the development of tools and services that are 1) useful 2) simple to use 3) reliable.  There are so many difficult integration and representation challenges in Linked Data development that the first requirement (finding a need) is trivial.  The latter two, on the other hand, require considerable thought, design, and (as with any tool for real human users), testing with real developers, iteration and feedback.

One of the core tenets of enAKTing has been to develop simple and essential software and services for the Linked Data web that are easy for all Linked Data developers to use – from casual to expert developers.  While we are far from perfecting these tools we’ve already seen considerable demand for several of these services — for example, our coreference (“sameAs”) service which can be used to find whether two concepts are equivalent; our Backlink service which can be used to find incoming links to concepts on the distributed web of data.  Our javascript-SPARQL proxy makes it possible for client-side code to directly query SPARQL endpoints – getting around difficulties such as same-host-restriction policies.   We have a host of new other services coming down the pipeline* that, together, we hope will help to usher new innovative Linked Data applications and developers.

But we can’t do it alone. What tools are you developing? What would you like to see? Let us know.

* Please see The enAKting Services for a list of services we are currently developing.

(Continued)

Reasoning for the Semantic Web with CLIPS

There is a pressing need in the Semantic Web community of exploiting data semantics coming from different hub contexts (e.g. geographic information, personal profiles, place and time) and to put that knowledge to work for ad hoc functionalities and services. The ontological content presents in data silos such as http://data.gov.uk, or http://ordnancesurvey.co.uk shall be firstly aligned to a common schema that can be then used to design functionalities and further knowledge bases that contains more personalised concepts such as: good locations for a meal, interesting news for me, or time schedule I need to know.

In classic AI rules based systems are a natural way to express operative knowledge about a domain (even an applicative one) and extend in expressive power classical representational frameworks such as DL ontologies. Reasoning with RDF instances that are distributed can be taunting and the integration of present tools can be tricky. An efficient tool for reasoning with rules that benefits of years of experience and a very efficient RETE implementation is CLIPS. Integrating a CLIPS engine with an existing application is quite easy. Java has its own implementation of CLIPS called Jess, while for the other languages a bridging is provided.

(Continued)

Welcome to EnAKTing

Welcome to the EPSRC-funded “EnAKTing project”, looking at challenges of the unbounded data Web.

The development of new Semantic Web technologies points to a new generation of Web capability that can explore and query, assemble, and integrate content in a context-aware, focused fashion. The basic idea is that we move from a document centric view of the Web to one in which data and information are the principle objects of interest.
With the emergence of a Web of data it is essential to address three key research problems, viz (1) how to build ontologies quickly that are capable of exploiting the potential of large-scale user participation, (2) how we query an unbounded web of linked data, (3) how to visualise, explore, browse and navigate this mass of data.

In the EnAKTing project, we are undertaking fundamental research in the above three areas.