DocumentCode :
3687622
Title :
Automatic data extraction of websites using data path matching and alignment
Author :
Yu-Chun Chu;Chiun-Chieh Hsu;Chen-Jhe Lee;Yu-Ting Tsai
Author_Institution :
Department of Information Management, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.
fYear :
2015
Firstpage :
60
Lastpage :
64
Abstract :
Since most of web pages contain their main information in data records, extracting data records enables one to obtain and integrate data from diverse sources of Internet. Therefore, data extraction of web pages has been a popular research issue in the last decade. The paper aims to automatically extract data records from web pages and identify items from those extracted records. The proposed approach utilizes Data Path Matching to effectively extract data records and Data Path Code Alignment to efficiently identify data items. Experimental results reveal that the method can extract data effectively.
Keywords :
"Data mining","Web pages","HTML","Visualization","Information filters","Yttrium"
Publisher :
ieee
Conference_Titel :
Digital Information Processing and Communications (ICDIPC), 2015 Fifth International Conference on
Type :
conf
DOI :
10.1109/ICDIPC.2015.7323006
Filename :
7323006
Link To Document :
بازگشت