TY - GEN
T1 - Automatically extracting form labels
AU - Nguyen, Hoa
AU - Kang, Eun Yong
AU - Freire, Juliana
PY - 2008
Y1 - 2008
N2 - We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.
AB - We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.
UR - http://www.scopus.com/inward/record.url?scp=52649109075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=52649109075&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2008.4497602
DO - 10.1109/ICDE.2008.4497602
M3 - Conference contribution
AN - SCOPUS:52649109075
SN - 9781424418374
T3 - Proceedings - International Conference on Data Engineering
SP - 1498
EP - 1500
BT - Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
T2 - 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Y2 - 7 April 2008 through 12 April 2008
ER -