DocumentCode :
3325050
Title :
Automatically Extracting Form Labels
Author :
Nguyen, Hoa ; Kang, Eun Yong ; Freire, Juliana
Author_Institution :
Sch. of Comput., Univ. of Utah, Salt Lake City, UT
fYear :
2008
fDate :
7-12 April 2008
Firstpage :
1498
Lastpage :
1500
Abstract :
We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.
Keywords :
information retrieval; learning (artificial intelligence); Web form interface; attribute label extraction; hidden-Web crawler; machine learning; online databases; schema matching; Cities and towns; Crawlers; Data mining; Databases; Engines; HTML; Humans; Information retrieval; Partial response channels; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
Type :
conf
DOI :
10.1109/ICDE.2008.4497602
Filename :
4497602
Link To Document :
بازگشت