DocumentCode
2860466
Title
Pseudo-Supervised Clustering for Text Documents
Author
Maggini, M. ; Rigutini, L. ; Turchi, M.
Author_Institution
Università di Siena, Italy
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
363
Lastpage
369
Abstract
Effective solutions for Web search engines can take advantage of algorithms for the automatic organization of documents into homogeneous clusters. Unfortunately, document clustering is not an easy task especially when the documents share a common set of topics, like in vertical search engines. In this paper we propose two clustering algorithms which can be tuned by the feedback of an expert. The feedback is used to choose an appropriate basis for the representation of documents, while the clustering is performed in the projected space. The algorithms are evaluated on a dataset containing papers from computer science conferences. The results show that an appropriate choice of the representation basis can yield better performance with respect to the original vector space model.
Keywords
Application software; Clustering algorithms; Clustering methods; Computer science; Feedback; Frequency; Navigation; Search engines; Text processing; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10138
Filename
1410827
Link To Document