DocumentCode :
449837
Title :
Document Clustering with Semantic Analysis
Author :
Wang, Yong ; Hodges, Julia
Author_Institution :
Mississippi State University
Volume :
3
fYear :
2006
fDate :
04-07 Jan. 2006
Abstract :
Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem, such a bag of original words cannot represent the content of a document precisely. In this paper, we investigate using the sense disambiguation method to identify the sense of words to construct the feature vector for document representation. Our experimental results demonstrate that in most conditions, using sense can improve the performance of our document clustering system. But the comprehensive statistical analysis performed indicates that the differences between using original single words and using senses of words are not statistically significant. In this paper, we also provide an evaluation of several basic clustering algorithms for algorithm selection.
Keywords :
Clustering algorithms; Computer science; Data engineering; Data mining; Databases; Frequency; Information retrieval; Partitioning algorithms; Statistical analysis; Thesauri;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on
ISSN :
1530-1605
Print_ISBN :
0-7695-2507-5
Type :
conf
DOI :
10.1109/HICSS.2006.129
Filename :
1579400
Link To Document :
بازگشت