Web information processing and extracting

Author

Gao, Kai ; Zong, Bao-qin ; Yang, Xiu-li

Author_Institution

Dept. of Inf. Sci. & Eng., Hebei Univ. of Sci. & Technol., Shijiazhuang, China

Volume

5

fYear

2010

fDate

11-14 July 2010

Firstpage

2350

Lastpage

2355

Abstract

With the rapid growth of the web, search engine has been an important tool to retrieve relevant information from the Internet. Due to the limited bandwidth, storage and some other limitations, the general search engine is not suitable for some situations. A topical search engine which is focused on collecting domain-specific issues by focused crawling is needed. It can provide higher accuracy than general search because of the lack of irrelevant information on the domain collection, so the web information processing and extracting is necessary. This paper presents some strategies on web information processing, together with analyzing and extracting based on data content mining. The experimental result validates the suitable of the approach, and some problems are also present in the end.

Keywords

Internet; data mining; information retrieval; search engines; Web information extracting; Web information processing; data content mining; search engine; Accuracy; Data mining; Databases; Materials; Noise; Web pages; Crawling; Information extracting; Information processing; Topical search;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics (ICMLC), 2010 International Conference on

Conference_Location

Qingdao

Print_ISBN

978-1-4244-6526-2

Type

conf

DOI

10.1109/ICMLC.2010.5580664

Filename

5580664