• DocumentCode
    3039280
  • Title

    Text Classification Using Semi-supervised Clustering

  • Author

    Zhang, Wen ; Yoshida, Taketoshi ; Tang, Xijin

  • fYear
    2009
  • fDate
    24-26 July 2009
  • Firstpage
    197
  • Lastpage
    200
  • Abstract
    In this paper, mixture models are used to classify documents. The basic assumption for the documents in a collection is that each class is composed of a number of mixture components. By identifying the components in the document collection, the classes of documents can thereby be identified from each other. A semi-supervised clustering method is proposed to identify the components (clusters), and further, unlabeled data is used to produce more accurate clusters in document collection to correspond the components of document classes. Experimental results show that the proposed method produces better performances than support vector machine (SVM) with linear kernel, and produces comparable performance with Bayesian classifier with expectation maximization (EM) in text classification.
  • Keywords
    pattern clustering; text analysis; Bayesian classifier; expectation maximization; linear kernel; mixture model; semisupervised clustering method; support vector machine; text classification; Bayesian methods; Clustering algorithms; Clustering methods; Internet; Kernel; Knowledge engineering; Partitioning algorithms; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Business Intelligence and Financial Engineering, 2009. BIFE '09. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-0-7695-3705-4
  • Type

    conf

  • DOI
    10.1109/BIFE.2009.54
  • Filename
    5208902