DocumentCode
3680998
Title
Application of Internet Technology and Web Information Extraction Wrapper Based on DOM for Agricultural Data Acquisition
Author
Liming Luo;Wen Lu;Bing Wei;Ye Qin;Yeqing Xiong
Author_Institution
Coll. of Inf. Eng., Capital Normal Univ., Beijing, China
fYear
2015
Firstpage
327
Lastpage
331
Abstract
This paper presents a construction method of Web Information extraction wrapper based on DOM is proposed. Combining XPath and pattern matching, it can deal with the two type of information at the same time under the guide of source and target knowledge library. Also, knowledge libraries help to extract more useful information for users. This paper introduces in detail the process of building the wrapper and the corresponding algorithm, including information judgment based on DOM, key extraction block determination by hierarchical clustering thoughts, extraction expression determination using inductive learning and natural language processing and so on.
Keywords
"Data mining","Web pages","Knowledge based systems","HTML","Internet","Feature extraction"
Publisher
ieee
Conference_Titel
Network and Information Systems for Computers (ICNISC), 2015 International Conference on
Type
conf
DOI
10.1109/ICNISC.2015.84
Filename
7311897
Link To Document