• DocumentCode
    3245232
  • Title

    Frequent term based peer-to-peer text clustering

  • Author

    He, Qing ; Li, Tingting ; Zhuang, Fuzhen ; Shi, Zhongzhi

  • Author_Institution
    Key Lab. of Intell. Inf. Process., Chinese Acad. of Sci., Beijing, China
  • fYear
    2010
  • fDate
    20-21 Oct. 2010
  • Firstpage
    352
  • Lastpage
    355
  • Abstract
    Text clustering is an important technology for automatically structuring large document collections. It is much more valuable in peer-to-peer networks. The high dimensionality of documents means much more communication could be saved if each node could get the approximate clustering result by distributed algorithm instead of transferring them into a center and do the clustering. Most of the existing text clustering algorithms in unstructured peer-to-peer networks are based on K-means algorithm. A problem of those algorithms is that the clustering quality may decreased with the increase of the network size. In this paper, we propose a text clustering algorithm based on frequent term sets for peer-to-peer networks. It requires relatively lower communication volume while achieving a clustering result whose quality will not be affected by the size of the network. Moreover, it gives a term set describing each cluster, which makes it possible for people to have a clear comprehension for the clustering result, and facilitates the users to find resource in the network or manage the local documents in accordance with the whole network.
  • Keywords
    data mining; pattern clustering; peer-to-peer computing; text analysis; K-means algorithm; frequent term sets; peer-to-peer text clustering; Copper; distributed data mining; frequent term set; peer-to-peer; text clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Acquisition and Modeling (KAM), 2010 3rd International Symposium on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-8004-3
  • Type

    conf

  • DOI
    10.1109/KAM.2010.5646177
  • Filename
    5646177