DocumentCode :
2860336
Title :
Tree-Structured Template Generation for Web Pages
Author :
Chuang, Shui-Lung ; Hsu, Jane Yung-jen
Author_Institution :
Academia Sinica, Taiwan
fYear :
2004
fDate :
20-24 Sept. 2004
Firstpage :
327
Lastpage :
333
Abstract :
As the web becomes an increasingly important source of information, tools for modeling, searching, and extracting information from Web pages are indispensable. By modeling the structure of a Web page defined by its markup tags, one can easily extract target information using structural templates. This paper introduces the Tree Template Automatic Generator (TTAG) that learns tree-structured templates from training Web pages. TTAG was applied to both query-based and frequently updated Web sites, and produced effective templates from a small number of examples. The experiments show that TTAG is a powerful extraction tool for semi-structured information sources.
Keywords :
Automata; Data mining; Databases; HTML; Information resources; Information science; Internet; Power generation; Seminars; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
Type :
conf
DOI :
10.1109/WI.2004.10101
Filename :
1410822
Link To Document :
بازگشت