Siphon++: A hidden-web crawler for keyword-based interfaces

Karane Vieira, Luciano Barbosa, Juliana Freire, Altigran Silva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The hidden Web consists of data that is generally hidden behind form interfaces, and as such, it is out of reach for traditional search engines. With the goal of leveraging the high-quality information in this largely unexplored portion of theWeb, in this paper, we propose a new strategy for automatically retrieving data hidden behind keyword-based form interfaces. Unlike previous approaches to this problem, our strategy adapts the query generation and selection by detecting features of the index. We describe a preliminary experimental evaluation which shows that our strategy is able to to obtain coverages that are higher than those of previous approaches that use a fixed strategy for query generation.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
Pages1361-1362
Number of pages2
DOIs
StatePublished - 2008
Event17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
Duration: Oct 26 2008Oct 30 2008

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other17th ACM Conference on Information and Knowledge Management, CIKM'08
CountryUnited States
CityNapa Valley, CA
Period10/26/0810/30/08

Keywords

  • Hidden-Web crawler
  • Online databases

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint Dive into the research topics of 'Siphon++: A hidden-web crawler for keyword-based interfaces'. Together they form a unique fingerprint.

Cite this