Towards large-scale unsupervised relation extraction from the Web

Bonan Min, Shuming Shi, Ralph Grishman, Chin Yew Lin

Research output: Contribution to journalArticlepeer-review

Abstract

The Web brings an open-ended set of semantic relations. Discovering the significant types is very challenging. Unsupervised algorithms have been developed to extract relations from a corpus without knowing the relation types in advance, but most rely on tagging arguments of predefined types. One recently reported system is able to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE (Information Extraction) algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ("synonymous") relation instances because of the sparseness of features. In this paper, the authors present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which they will show to be very effective for unsupervised relation extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the Web.

Original languageEnglish (US)
Pages (from-to)1-23
Number of pages23
JournalInternational Journal on Semantic Web and Information Systems
Volume8
Issue number3
DOIs
StatePublished - Jul 2012

Keywords

  • Information extraction
  • Large-scale
  • Relation extraction
  • Semantics
  • Unsupervised learning
  • Web

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Towards large-scale unsupervised relation extraction from the Web'. Together they form a unique fingerprint.

Cite this