Title :
Application of Internet Technology and Web Information Extraction Wrapper Based on DOM for Agricultural Data Acquisition
Author :
Liming Luo;Wen Lu;Bing Wei;Ye Qin;Yeqing Xiong
Author_Institution :
Coll. of Inf. Eng., Capital Normal Univ., Beijing, China
Abstract :
This paper presents a construction method of Web Information extraction wrapper based on DOM is proposed. Combining XPath and pattern matching, it can deal with the two type of information at the same time under the guide of source and target knowledge library. Also, knowledge libraries help to extract more useful information for users. This paper introduces in detail the process of building the wrapper and the corresponding algorithm, including information judgment based on DOM, key extraction block determination by hierarchical clustering thoughts, extraction expression determination using inductive learning and natural language processing and so on.
Keywords :
"Data mining","Web pages","Knowledge based systems","HTML","Internet","Feature extraction"
Conference_Titel :
Network and Information Systems for Computers (ICNISC), 2015 International Conference on
DOI :
10.1109/ICNISC.2015.84