Title :
Web Search with Text Categorization Using Probabilistic Framework of SVM
Author :
Lim, B.P.C. ; Tsui, M.H. ; Charastrakul, V. ; Shi, D.
Author_Institution :
Nanyang Technol. Univ., Singapore
Abstract :
The role of text categorization algorithms is to deal with the ever increasing amount of documents either online or offline. Its capability to organize numerous documents into pre-defined categories significantly increases the efficiency and decreases human resources. Recently, support vector machine (SVM) gained popularity due to its excellent generalization ability and fast training speed on large dataset. However, the performance of SVM heavily relies on the penalty coefficient parameter and kernel parameters. In this paper, we implement a probabilistic framework for support vector machine (PSVM) that allows for automatic tuning of the penalty coefficient parameters and the kernel parameters via Markov chain Monte Carlo (MCMC) method and apply it to Web searching via text categorization. This probabilistic framework was tested on well known benchmark text categorization dataset. The result from PSVM was compared against the conventional SVM, and K-nearest neighbor with P-tree (KNN-Ptree) and KNN. The proposed methodology is applied to develop a Web search engine.
Keywords :
Internet; Markov processes; Monte Carlo methods; classification; probability; support vector machines; text analysis; Markov Chain Monte Carlo method; Web search; kernel parameter; penalty coefficient parameter; probabilistic framework; support vector machine; text categorization; Cybernetics; Data compression; Humans; Kernel; Monte Carlo methods; Support vector machine classification; Support vector machines; Testing; Text categorization; Web search;
Conference_Titel :
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
1-4244-0099-6
Electronic_ISBN :
1-4244-0100-3
DOI :
10.1109/ICSMC.2006.384566