Title :
Web document categorization by Support Vector Clustering
Author :
Shi, Daming ; Tsui, Ming Hei ; Liu, Jigang
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
Abstract :
Search Engine has proven its effectiveness for retrieval of information from World Wide Web. Traditionally, the search results are arranged in an ordered list by popularity and relevancy. However, the enormous size of matched Web pages causes inefficiency for users to locate the most relevant Web pages. A proper organization of the search result is important to improve its browsability of Web searching. In this paper, we proposed by performing Support Vector Clustering (SVC) on the search result to reorganize results in groups of similar context to facilitate effective browsing of search result by the users. SVC is a nonparametric clustering algorithm that can group clusters with arbitrary shapes and without the need to specify the number of clusters. It is a kernel clustering method that maps via a nonlinear function to a high dimension feature space. To obtain the optimal clustering result, choosing of the accurate parameters (kernel width and penalty coefficient) for SVC is crucial. In this paper, it proposed an automatic tuning method for SVC parameters to obtain the optimal result. The results from the experiment have proven the effectiveness and usefulness of above mentioned method. The performance is comparable to other popular clustering techniques.
Keywords :
Internet; information retrieval; nonlinear functions; online front-ends; pattern clustering; search engines; support vector machines; text analysis; Web browsability; Web document categorization; automatic tuning method; information retrieval; nonlinear function; search engine; support vector clustering; Clustering algorithms; Clustering methods; Information retrieval; Kernel; Search engines; Simulated annealing; Static VAr compensators; Support vector machines; Web pages; Web sites; Document clustering; simulated annealing; support vector clustering;
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
DOI :
10.1109/ICSMC.2008.4811495