Swisslink: High-precision, context-free entity linking exploiting unambiguous labels

Roman Prokofyev, Michael Luggen, Djellel Eddine Difallah, Philippe Cudré-Mauroux

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-Accurate (precision > 95%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-The-Art entity-linking systems and human annotators.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th International Conference on Semantic Systems, SEMANTiCS 2017
EditorsRinke Hoekstra, Victor de Boer, Tassilo Pellegrini, Rinke Hoekstra, Catherine Faron-Zucker
PublisherAssociation for Computing Machinery
Pages65-72
Number of pages8
ISBN (Electronic)9781450352963
DOIs
StatePublished - Sep 11 2017
Event13th International Conference on Semantic Systems, SEMANTiCS 2017 - Amsterdam, Netherlands
Duration: Sep 12 2017Sep 13 2017

Publication series

NameACM International Conference Proceeding Series
Volume2017-September

Conference

Conference13th International Conference on Semantic Systems, SEMANTiCS 2017
Country/TerritoryNetherlands
CityAmsterdam
Period9/12/179/13/17

Keywords

  • Entity Linking
  • Machine learning
  • Manual annotations

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Swisslink: High-precision, context-free entity linking exploiting unambiguous labels'. Together they form a unique fingerprint.

Cite this