DocumentCode :
3039280
Title :
Text Classification Using Semi-supervised Clustering
Author :
Zhang, Wen ; Yoshida, Taketoshi ; Tang, Xijin
fYear :
2009
fDate :
24-26 July 2009
Firstpage :
197
Lastpage :
200
Abstract :
In this paper, mixture models are used to classify documents. The basic assumption for the documents in a collection is that each class is composed of a number of mixture components. By identifying the components in the document collection, the classes of documents can thereby be identified from each other. A semi-supervised clustering method is proposed to identify the components (clusters), and further, unlabeled data is used to produce more accurate clusters in document collection to correspond the components of document classes. Experimental results show that the proposed method produces better performances than support vector machine (SVM) with linear kernel, and produces comparable performance with Bayesian classifier with expectation maximization (EM) in text classification.
Keywords :
pattern clustering; text analysis; Bayesian classifier; expectation maximization; linear kernel; mixture model; semisupervised clustering method; support vector machine; text classification; Bayesian methods; Clustering algorithms; Clustering methods; Internet; Kernel; Knowledge engineering; Partitioning algorithms; Support vector machine classification; Support vector machines; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Business Intelligence and Financial Engineering, 2009. BIFE '09. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-0-7695-3705-4
Type :
conf
DOI :
10.1109/BIFE.2009.54
Filename :
5208902
Link To Document :
بازگشت