• Title of article

    A three-phase approach to document clustering based on topic significance degree

  • Author/Authors

    Ma، نويسنده , , Yinglong and Wang، نويسنده , , Yao-xing Jin، نويسنده , , Beihong، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2014
  • Pages
    8
  • From page
    8203
  • To page
    8210
  • Abstract
    Topic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach.
  • Keywords
    Document clustering , K-Means , K-means++ , Topic model
  • Journal title
    Expert Systems with Applications
  • Serial Year
    2014
  • Journal title
    Expert Systems with Applications
  • Record number

    2355344