Title :
Web Page Categorization Based on k-NN and SVM Hybrid Pattern Recognition Algorithm
Author :
Shi, Xuelin ; Zhao, Ying ; Dong, Xiangjun
Author_Institution :
Sch. of Inf. Sci. & Technol., Beijing Univ. of Chem. Technol., Beijing
Abstract :
Traditional information retrieval (IR) method use keywords matching to filter the documents, but usually retrieves unrelated Web pages. In order to effectively classify Web pages, we present a Web page categorization algorithm, named WebPSC (Web page similarity categorization). This algorithm uses latent semantic indexing (LSI) to model Web pages and implement categorization based on hybrid pattern recognition algorithm of the k-NN and support vector machine (SVM). As an implementation of WebPSC, an intelligent agent system to acquire user interest and help user retrieving Web pages is presented. Empirical results of using this algorithm indicate our method can reach high levels of accuracy in Web page classification.
Keywords :
Web sites; information retrieval; pattern recognition; support vector machines; SVM; Web page similarity categorization; information retrieval; intelligent agent system; k-NN; k-nearest neighbor; latent semantic indexing; pattern recognition algorithm; support vector machine; Indexing; Information filtering; Information filters; Information retrieval; Large scale integration; Matched filters; Pattern recognition; Support vector machine classification; Support vector machines; Web pages; Categorization; Latent Semantic Indexing; Singular Value Decomposition; Support Vector Machine; k-NN;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.574