DocumentCode
495194
Title
Web Data Extraction Based on Label Library
Author
Tan, Shoubiao ; Fan, Jin ; Jiang, Yuan
Author_Institution
Anhui Univ., Hefei, China
Volume
5
fYear
2009
fDate
March 31 2009-April 2 2009
Firstpage
134
Lastpage
138
Abstract
A Web data extraction technique based on label library is proposed for extracting information from data intensive Web pages in this paper. It eliminates conception ambiguity of the contents of Web pages with the label library, mines data regions by using MDR repeated patterns discovery algorithm, recognizes their structure and extracts data from them through a novel hierarchic pattern recognition and data extraction algorithm. Experiments showed it has perfect effect.
Keywords
Internet; data mining; information retrieval; MDR repeated patterns discovery algorithm; Web data extraction; Web pages; data mining; hierarchic pattern recognition; information extraction; label library; Computer science; Data engineering; Data mining; Information resources; Labeling; Libraries; Pattern recognition; Programming profession; Web pages; Writing; Web information extraction; data intensive Web pages; label library;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Engineering, 2009 WRI World Congress on
Conference_Location
Los Angeles, CA
Print_ISBN
978-0-7695-3507-4
Type
conf
DOI
10.1109/CSIE.2009.595
Filename
5170512
Link To Document