DocumentCode :
2227108
Title :
Learning information extraction patterns from tabular Web pages without manual labelling
Author :
Gao, Xiaoying ; Zhang, Mengjie ; Andreae, Peter
Author_Institution :
Sch. of Math. & Comput. Sci., Victoria Univ., Wellington, New Zealand
fYear :
2003
fDate :
13-17 Oct. 2003
Firstpage :
495
Lastpage :
498
Abstract :
We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three corpora containing a series of tabular Web sites from different domains and achieved a success rate of at least 80%. A significant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.
Keywords :
Web sites; information retrieval; learning (artificial intelligence); pattern matching; Web sites; automatic pattern generation; information extraction pattern learning; machine learning; manual labeling; semistructured data; tabular Web pages; training page; wrapper; Data mining; Databases; Humans; Information retrieval; Labeling; Machine learning; Pattern matching; Testing; Web pages; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN :
0-7695-1932-6
Type :
conf
DOI :
10.1109/WI.2003.1241249
Filename :
1241249
Link To Document :
بازگشت