• DocumentCode
    3760821
  • Title

    Semi-supervised clustering with soft labels

  • Author

    Cynthia Marea Nebu;Sumy Joseph

  • Author_Institution
    Amal Jyothi College of Engineering, Kerala, India
  • fYear
    2015
  • Firstpage
    612
  • Lastpage
    616
  • Abstract
    This paper devises a semi-supervised learning algorithm to cluster text documents. The proposed algorithm clusters multi-dimensional documents using the k-means algorithm. It initially reduces the dimensionality of the text so that the clustering algorithm can perform well in the low dimensional feature space. It also removes the irrelevant, redundant and noisy features from the corpus which may otherwise mislead the underlying algorithm. The proposed method employs pLSA algorithm to generate soft labels from these reduced feature subset and these labels along with the class labels guide the k-means algorithm. Experiments were conducted on Reuters-21,578 dataset and the results obtained showed that the proposed method outperforms many previous clustering algorithms without supervision.
  • Keywords
    "Clustering algorithms","Feature extraction","Semisupervised learning","Support vector machines","Noise measurement","Semantics","Algorithm design and analysis"
  • Publisher
    ieee
  • Conference_Titel
    Control Communication & Computing India (ICCC), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICCC.2015.7432969
  • Filename
    7432969