Querying Wikipedia documents and relationships

Huong Nguyen, Thanh Nguyen, Hoa Nguyen, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Wikipedia has become an important source of information which is growing very rapidly. However, the existing infrastructure for querying this information is limited and often ignores the inherent structure in the information and links across documents. In this paper, we present a new approach for querying Wikipedia content that supports a simple, yet expressive query interfaces that allow both keyword and structured queries. A unique feature of our approach is that, besides returning documents that match the queries, it also exploits relationships among documents to return richer, multi-document answers. We model Wikipedia as a graph and cast the problem of finding answers for queries as graph search. To guide the answer-search process, we propose a novel weighting scheme to identify important nodes and edges in the graph. By leveraging the structured information available in infoboxes, our approach supports queries that specify constraints over this structure, and we propose a new search algorithm to support these queries. We evaluate our approach using a representative subset of Wikipedia documents and present results which show that our approach is effective and derives high-quality answers.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th International Workshop on the Web and Databases, WebDB 2010, Co-located with ACM SIGMOD 2010
PublisherAssociation for Computing Machinery
ISBN (Print)9781450301862
DOIs
StatePublished - 2010
Event13th International Workshop on the Web and Databases, WebDB 2010, Co-located with ACM SIGMOD 2010 - Indianapolis, IN, United States
Duration: Jun 6 2010Jun 6 2010

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other13th International Workshop on the Web and Databases, WebDB 2010, Co-located with ACM SIGMOD 2010
Country/TerritoryUnited States
CityIndianapolis, IN
Period6/6/106/6/10

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this