Title :
CUCS: A Web Page Classification Algorithm for Large Training Set
Author :
Wang, Jing ; Cai, Hongming ; Xu, Boyi ; Jiang, Lihong
Author_Institution :
Sch. of Software, Shanghai Jiao Tong Univ., Shanghai
Abstract :
This paper presents a new algorithm of Web page classification, CUCS(Combined UC and SVM), for large training set. CUCS combines the advantages of SVM (Support Vector Machine) and UC (Unsupervised Clustering), achieving high precision and fast speed. In the training stage, CUCS gets clustering centers, which include positive example centers and negative ones, by means of UC. Then CUCS prunes training set to produce classifier by SVM. In the classifying stage, the minimum distance from a Web page to the positive centers, as well as to the negative centers, is calculated. If the difference between the two distances is large enough, the Web page will be classified by UC. Otherwise, the Web page will be classified by pruned SVM. Through experiments, CUCS manifests precision that is much higher than UC and a little higher than SVM. As to time consumed, CUCS costs more time than UC and far less than SVM.
Keywords :
Internet; classification; pattern clustering; support vector machines; unsupervised learning; CUCS; SVM; Web page classification algorithm; large training set; support vector machine; unsupervised clustering; Classification algorithms; Clustering algorithms; Costs; Educational institutions; Internet; Parallel processing; Software algorithms; Support vector machine classification; Support vector machines; Web pages; Clustering Algorithm; Large training set; SVM; Web page classification;
Conference_Titel :
Network and Parallel Computing, 2008. NPC 2008. IFIP International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3354-4
DOI :
10.1109/NPC.2008.11