DocumentCode :
3113154
Title :
Web document categorization by Support Vector Clustering
Author :
Shi, Daming ; Tsui, Ming Hei ; Liu, Jigang
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
fYear :
2008
fDate :
12-15 Oct. 2008
Firstpage :
1483
Lastpage :
1488
Abstract :
Search Engine has proven its effectiveness for retrieval of information from World Wide Web. Traditionally, the search results are arranged in an ordered list by popularity and relevancy. However, the enormous size of matched Web pages causes inefficiency for users to locate the most relevant Web pages. A proper organization of the search result is important to improve its browsability of Web searching. In this paper, we proposed by performing Support Vector Clustering (SVC) on the search result to reorganize results in groups of similar context to facilitate effective browsing of search result by the users. SVC is a nonparametric clustering algorithm that can group clusters with arbitrary shapes and without the need to specify the number of clusters. It is a kernel clustering method that maps via a nonlinear function to a high dimension feature space. To obtain the optimal clustering result, choosing of the accurate parameters (kernel width and penalty coefficient) for SVC is crucial. In this paper, it proposed an automatic tuning method for SVC parameters to obtain the optimal result. The results from the experiment have proven the effectiveness and usefulness of above mentioned method. The performance is comparable to other popular clustering techniques.
Keywords :
Internet; information retrieval; nonlinear functions; online front-ends; pattern clustering; search engines; support vector machines; text analysis; Web browsability; Web document categorization; automatic tuning method; information retrieval; nonlinear function; search engine; support vector clustering; Clustering algorithms; Clustering methods; Information retrieval; Kernel; Search engines; Simulated annealing; Static VAr compensators; Support vector machines; Web pages; Web sites; Document clustering; simulated annealing; support vector clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
ISSN :
1062-922X
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
Type :
conf
DOI :
10.1109/ICSMC.2008.4811495
Filename :
4811495
Link To Document :
بازگشت