• DocumentCode
    2860466
  • Title

    Pseudo-Supervised Clustering for Text Documents

  • Author

    Maggini, M. ; Rigutini, L. ; Turchi, M.

  • Author_Institution
    Università di Siena, Italy
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    363
  • Lastpage
    369
  • Abstract
    Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.
  • Keywords
    Application software; Clustering algorithms; Clustering methods; Computer science; Feedback; Frequency; Navigation; Search engines; Text processing; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10138
  • Filename
    1410827