VoldemortKG: Mapping schema.org and web entities to linked open data

Alberto Tonon, Victor Felder, Djellel Eddine Difallah, Philippe Cudré-Mauroux

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Increasingly, webpages mix entities coming from various sources and represented in different ways. It can thus happen that the same entity is both described by using schema.org annotations and by creating a text anchor pointing to its Wikipedia page. Often, those representations provide complementary information which is not exploited since those entities are disjoint. We explored the extent to which entities represented in different ways repeat on the Web, how they are related, and how they complement (or link) to each other. Our initial experiments showed that we can unveil a previously unexploited knowledge graph by applying simple instance matching techniques on a large collection of schema.org annotations and Wikipedia. The resulting knowledge graph aggregates entities (often tail entities) scattered across several webpages, and complements existing Wikipedia entities with new facts and properties. In order to facilitate further investigation in how to mine such information, we are releasing (i) an excerpt of all Common Crawl webpages containing both Wikipedia and schema.org annotations, (ii) the toolset to extract this information and perform knowledge graph construction and mapping onto DBpedia, as well as (iii) the resulting knowledge graph (VoldemortKG) obtained via label matching techniques.

Original languageEnglish (US)
Title of host publicationThe Semantic Web - ISWC 2016 - 15th International Semantic Web Conference, 2016, Proceedings
EditorsMarta Sabou, Freddy Lecue, Paul Groth, Elena Simperl, Markus Krotzsch, Freddy Lecue, Alasdair Gray, Fabian Flock, Yolanda Gil
PublisherSpringer Verlag
Pages220-228
Number of pages9
ISBN (Print)9783319465463
DOIs
StatePublished - 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9982 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • Data integration
  • Dataset
  • Instance matching
  • Knowledge graphs
  • Schema.org

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'VoldemortKG: Mapping schema.org and web entities to linked open data'. Together they form a unique fingerprint.

Cite this