DocumentCode
440172
Title
The design and implementation of a subject-oriented Web information classification system
Author
Huang, Yishan ; Wang, Qianping ; Yang, Jing ; Ding, Quan
Author_Institution
Sch. of Comput., China Univ. of Min. & Technol., JiangSu, China
Volume
2
fYear
2005
fDate
24-26 May 2005
Firstpage
836
Abstract
With the explosive growth of World Wide Web, it is becoming increasingly difficult for users to collect and analyze Web pages that are relevant to a particular subject. In this paper, a subject-oriented Web information classification system (WICS) is presented, by which Web pages can be efficiently collected and classified into several subjects, and the search results are provided to users. Based on analyzing the ordinary search engines, Web text mining is introduced and applied to the WICS. The text preprocessing, index, inverted files and vector space distance algorithm (vector space model, VSM) are brought forward in the prototype. The initial experiments show that classify Web information by the prototype makes convenience for users to inquire information; the relevancy and precision are improved.
Keywords
classification; data mining; document handling; search engines; Web page classification; Web page collection; Web text mining; World Wide Web; data mining; information inquiry; search engine; subject-oriented Web information classification system; vector space distance algorithm; vector space model; Data mining; Explosives; Frequency; Information analysis; Internet; Prototypes; Search engines; Text mining; Web pages; Wide area networks;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Supported Cooperative Work in Design, 2005. Proceedings of the Ninth International Conference on
Print_ISBN
1-84600-002-5
Type
conf
DOI
10.1109/CSCWD.2005.194294
Filename
1504201
Link To Document