Automatically extracting form labels

Hoa Nguyen, Eun Yong Kang, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.

Original languageEnglish (US)
Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Pages1498-1500
Number of pages3
DOIs
StatePublished - 2008
Event2008 IEEE 24th International Conference on Data Engineering, ICDE'08 - Cancun, Mexico
Duration: Apr 7 2008Apr 12 2008

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other2008 IEEE 24th International Conference on Data Engineering, ICDE'08
CountryMexico
CityCancun
Period4/7/084/12/08

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'Automatically extracting form labels'. Together they form a unique fingerprint.

  • Cite this

    Nguyen, H., Kang, E. Y., & Freire, J. (2008). Automatically extracting form labels. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08 (pp. 1498-1500). [4497602] (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2008.4497602