The LOD2 project is happy to announce the release of the DBpedia Knowledge Base Version 3.9.
Knowledge bases are playing an increasingly important role in enhancing the intelligence of Web and enterprise search and in supporting information integration as well as natural language processing. Today, most knowledge bases cover only specific domains, are created by relatively small groups of knowledge engineers, and are very cost intensive to keep up-to-date as domains change. At the same time, Wikipedia has grown into one of the central knowledge sources of mankind, maintained by thousands of contributors.
The DBpedia project leverages this gigantic source of knowledge by extracting structured information from Wikipedia and by making this information accessible on the Web as a large, multilingual, cross-domain knowledge base.
The English version of the DBpedia knowledge base version 3.9 describes 4.0 million things, out of which 3.22 million are classified in a consistent Ontology, including 832,000 persons, 639,000 places (including 427,000 populated places), 372,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases.
We provide localized versions of DBpedia in 119 languages. All these versions together describe 24.9 million things, out of which 16.8 million overlap (are interlinked) with the concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 12.6 million unique things in 119 different languages; 24.6 million links to images and 27.6 million links to external web pages; 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories.
Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external data sets.
Detailed statistics about the DBpedia data sets in 24 popular languages are provided at Dataset Statistics.
The most important improvements of the new DBpedia release compared to DBpedia 3.8 are:
1. the new release is based on updated Wikipedia dumps dating from March / April 2013 (the 3.8 release was based on dumps from June 2012), leading to an overall increase in the number of concepts in the English edition from 3.7 to 4.0 million things.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner concept descriptions.
3. we extended the DBpedia type system to also cover Wikipedia articles that do not contain an infobox.
4. we provide links pointing from DBpedia concepts to Wikidata concepts and updated the links pointing at YAGO concepts and classes, making it easier to integrate knowledge from these sources.
More information about DBpedia is found at http://dbpedia.org/About as well as in the new overview article about the project.
Lots of thanks to
- Jona Christopher Sahnwaldt (Freelancer funded by the University of Mannheim, Germany) for improving the DBpedia extraction framework, for extracting the DBpedia 3.9 data sets for all 119 languages, and for generating the updated RDF links to external data sets.
- All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
- Heiko Paulheim (University of Mannheim, Germany) for inventing and implementing the algorithm to generate additional type statements for formerly untyped resources.
- The whole Internationalization Committee for pushing the DBpedia internationalization forward.
- Dimitris Kontokostas (University of Leipzig) for improving the DBpedia extraction framework and loading the new release onto the DBpedia download server in Leipzig.
- Volha Bryl (University of Mannheim, Germany) for generating the statistics about the new release.
- Petar Ristoski (University of Mannheim, Germany) for generating the updated links pointing at the GADM database of Global Administrative Areas.
- Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint.
- OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
- Julien Cojan, Andrea Di Menna, Ahmed Ktob, Julien Plu, Jim Regan and others who contributed improvements to the DBpedia extraction framework via the source code repository on GitHub.