Combining classifiers to identify online databases

Luciano Barbosa, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem of identifying the domain of onlinedatabases. More precisely, given a set F of Web forms automaticallygathered by a focused crawler and an online databasedomain D, our goal is to select from F only the formsthat are entry points to databases in D. Having a set ofWebforms that serve as entry points to similar online databasesis a requirement for many applications and techniques thataim to extract and integrate hidden-Web information, suchas meta-searchers, online database directories, hidden-Webcrawlers, and form-schema matching and merging.We propose a new strategy that automatically and accuratelyclassifies online databases based on features that canbe easily extracted from Web forms. By judiciously partitioningthe space of form features, this strategy allows theuse of simpler classifiers that can be constructed using learningtechniques that are better suited for the features of eachpartition. Experiments using real Web data in a representativeset of domains show that the use of different classifiersleads to high accuracy, precision and recall. This indicatesthat our modular classifier composition provides an effectiveand scalable solution for classifying online databases.

Original languageEnglish (US)
Title of host publication16th International World Wide Web Conference, WWW2007
Pages431-440
Number of pages10
DOIs
StatePublished - 2007
Event16th International World Wide Web Conference, WWW2007 - Banff, AB, Canada
Duration: May 8 2007May 12 2007

Publication series

Name16th International World Wide Web Conference, WWW2007

Other

Other16th International World Wide Web Conference, WWW2007
Country/TerritoryCanada
CityBanff, AB
Period5/8/075/12/07

Keywords

  • Hidden web
  • Hierarchical classifiers
  • Learning classifiers
  • Online database directories
  • Web crawlers

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Combining classifiers to identify online databases'. Together they form a unique fingerprint.

Cite this