Title :
Web information processing and extracting
Author :
Gao, Kai ; Zong, Bao-qin ; Yang, Xiu-li
Author_Institution :
Dept. of Inf. Sci. & Eng., Hebei Univ. of Sci. & Technol., Shijiazhuang, China
Abstract :
With the rapid growth of the web, search engine has been an important tool to retrieve relevant information from the Internet. Due to the limited bandwidth, storage and some other limitations, the general search engine is not suitable for some situations. A topical search engine which is focused on collecting domain-specific issues by focused crawling is needed. It can provide higher accuracy than general search because of the lack of irrelevant information on the domain collection, so the web information processing and extracting is necessary. This paper presents some strategies on web information processing, together with analyzing and extracting based on data content mining. The experimental result validates the suitable of the approach, and some problems are also present in the end.
Keywords :
Internet; data mining; information retrieval; search engines; Web information extracting; Web information processing; data content mining; search engine; Accuracy; Data mining; Databases; Materials; Noise; Web pages; Crawling; Information extracting; Information processing; Topical search;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
DOI :
10.1109/ICMLC.2010.5580664