Weblog

Added value of linking data (Polish economy data)

Recently I2G has been working on publishing Polish economy data in a paradigm of linked data. One of the datasets concerns import and export of Poland (INSIGOS HZ/GEO). The geographical dimension lists 210 countries. Standardization of this vocabulary was also part of the publication effort, and it consisted of linking to DBpedia, and assigning two- and three-letter codes. Such prepared vocabulary is a natural candidate for linking various datasets.

The INSIGOS HZ/GEO dataset contains absolute values of import and export. As a general rule export to “bigger” countries has greater value. One may wish to normalize this value to see the real power of the country regarding consumption of exported products. One of the possibilities is to calculate value of export per capita, i.e. divide value of export by population. Normally, getting values of population and then combining with INSIGOS data would be very time consuming. We can, however, use an external data.

There are two possibilities to get the population from another source. In order to get it from DBpedia, one can issue the following SPARQL query at http://dbpedia.org/sparql/ (we should get 209 countries with 222 labels, the population is returned as xsd:integer):

 

SELECT distinct ?country ?label ?population

WHERE {

?country a <http://dbpedia.org/ontology/Country> .

?country rdfs:label ?label

FILTER (lang(?label) = “en”) .

?country <http://dbpedia.org/property/populationCensus> ?population .

}

ORDER BY ?label

 

Another possibility is to get appropriate data from LinkedGeoData. One has to issue the following SPARQL query at http://linkedgeodata.org/sparql :

 

PREFIX lgdo: <http://linkedgeodata.org/ontology/>

PREFIX lgdp: <http://linkedgeodata.org/property/>

SELECT *

WHERE {

?country a lgdo:Country .

?country lgdp:country_code_iso3166_1_alpha_2 ?cc .

?country lgdo:population ?population .

?country rdfs:label ?label FILTER (lang(?label) = “en”) .

}

ORDER BY ?label

 

Only 92 countries are returned. Please note that not all countries do have a population entered. Also, not all countries do have labels, so the more appropriate query would look like this:

 

PREFIX lgdo: <http://linkedgeodata.org/ontology/>

PREFIX lgdp: <http://linkedgeodata.org/property/>

SELECT *

WHERE {

?country a lgdo:Country .

?country lgdp:country_code_iso3166_1_alpha_2 ?cc .

OPTIONAL {?country lgdo:population ?population } .

OPTIONAL {?country rdfs:label ?label  FILTER (lang(?label) = “en”)} .

}

 

We will now demonstrate a federated query that will combine our dictionary with population data from external source:

 

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

PREFIX lgdo: <http://linkedgeodata.org/ontology/>

PREFIX lgdp: <http://linkedgeodata.org/property/>

SELECT DISTINCT *

WHERE {

SERVICE <http://i2g.unixstorm.org/sparql> {

GRAPH <http://data.i2g.pl/dic/country> {

?country1 a skos:Concept .

?country1 skos:notation ?cc1 .

}

}

SERVICE <http://linkedgeodata.org/sparql> {

?country2 a lgdo:Country .

?country2 lgdp:country_code_iso3166_1_alpha_2 ?cc1 .

?country2 lgdo:population ?population .

}

}

 

Technical note: please note the GRAPH keyword. It is present in the first SERVICE block and absent from the second. It has to be that way. Otherwise, the query does not work as expected; even though there is one default graph at LinkedGeoData it must not be mentioned.

 

An excerpt from the results it in the table below:

 

country1 cc1 country2 population
http://data.i2g.pl/dic/country#RS RS http://linkedgeodata.org/triplify/node369137525 10159046
http://data.i2g.pl/dic/country#AL AL http://linkedgeodata.org/triplify/node424310601 3619778
http://data.i2g.pl/dic/country#PL PL http://linkedgeodata.org/triplify/node432425060 38186860
http://data.i2g.pl/dic/country#AD AD http://linkedgeodata.org/triplify/node148874198 83888
http://data.i2g.pl/dic/country#AT AT http://linkedgeodata.org/triplify/node26847709 8205533

 

 

Now we can combine in one query our observations with data about population, and at the same time calculating the value of export per capita.

We first present a sample observation from our dataset defined in accordance with Data Cube Vocabulary (turtle):

 

<http://data.i2g.pl/insigos/hz/geo/export/AT/2012/EUR>

a qb:Observation ;

sdmx-measure:obsValue “1987322489″ ;

prop:indicator “export”;

prop:year “2012″ ;

prop:country “AT” ;

prop:unit “EUR” .

 

An export from Poland to Austria in 2012 denominated in Euro was 1,987,322,489.

 

Assume that we want to make a chart for export to all countries in 2012, denominated in Euro, calculated ‘per capita’. The following federated SPARQL query makes it possible:

 

PREFIX qb: <http://purl.org/linked-data/cube#>

PREFIX prop: <http://data.i2g.pl/insigos/properties/>

PREFIX sdmx-measure: <http://purl.org/linked-data/sdmx/2009/measure#>

PREFIX lgdo: <http://linkedgeodata.org/ontology/>

PREFIX lgdp: <http://linkedgeodata.org/property/>

PREFIX op: <http://www.w3.org/2002/08/xquery-operators#>

SELECT ?obs ?cc ?name ?export ?population (bif:atof(?export) / ?population as ?percapita)

WHERE {

SERVICE <http://i2g.unixstorm.org/sparql> {

GRAPH <http://data.i2g.pl/insigos/hz/geo> {

?obs a qb:Observation .

?obs prop:indicator “export” .

?obs prop:year “2012″ .

?obs prop:unit “EUR” .

?obs prop:country ?cc .

?obs sdmx-measure:obsValue ?export .

}

}

SERVICE <http://linkedgeodata.org/sparql> {

?country a lgdo:Country .

?country lgdp:country_code_iso3166_1_alpha_2 ?cc .

?country lgdo:population ?population .

?country rdfs:label ?name .

FILTER(lang(?name)=”en”)

}

}

ORDER BY DESC(?percapita)

Technical note: please note the calculation of ?percapita variable. obsValue does not seem to have any type attached, therefore we have to convert it explicitly into number bif:atof(?export) using Virtuoso built-in function.

 

The final table for first 10 countries with the highest ‘per capita’ indicator is presented below:

 

obs cc name export population percapita
http://data.i2g.pl/insigos/hz/geo/export/AG/2012/EUR AG “Antigua and Barbuda”@en 81011744 69842 1159.93
http://data.i2g.pl/insigos/hz/geo/export/EE/2012/EUR EE “Estonia”@en 625339113 1307605 478.232
http://data.i2g.pl/insigos/hz/geo/export/LT/2012/EUR LT “Lithuania”@en 1649783326 3565205 462.746
http://data.i2g.pl/insigos/hz/geo/export/NO/2012/EUR NO “Norway”@en 1686482282 4785200 352.437
http://data.i2g.pl/insigos/hz/geo/export/DK/2012/EUR DK “Denmark”@en 1843725234 5475791 336.705
http://data.i2g.pl/insigos/hz/geo/export/DE/2012/EUR DE “Germany”@en 26423673109 81879976 322.712
http://data.i2g.pl/insigos/hz/geo/export/SE/2012/EUR SE “Sweden”@en 2761330775 9276509 297.669
http://data.i2g.pl/insigos/hz/geo/export/NL/2012/EUR NL “The Netherlands”@en 4711961466 16645313 283.08
http://data.i2g.pl/insigos/hz/geo/export/LU/2012/EUR LU “Luxembourg”@en 136694136 493500 276.989
http://data.i2g.pl/insigos/hz/geo/export/HU/2012/EUR HU “Hungary”@en 2450433404 9930915 246.748

 

Another added value of interlinking is the possibility to switch language labels very quickly. One can prepare the following query in Japanes even though we do not have such resources in our vocabulary. It is sufficient to substitute “en” with “ja”.

Posted in Misc, WP9 – LOD2 for Citizen – PublicData.eu | Tagged , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


join our monthly webinars

Public Mailinglist & Newsletter

Please subscribe me to the LOD2 mailinglist.
my email address
my name (optional)
goto archive

Follow


Follow lod2project on Twitter