DocumentCode
449837
Title
Document Clustering with Semantic Analysis
Author
Wang, Yong ; Hodges, Julia
Author_Institution
Mississippi State University
Volume
3
fYear
2006
fDate
04-07 Jan. 2006
Abstract
Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem, such a bag of original words cannot represent the content of a document precisely. In this paper, we investigate using the sense disambiguation method to identify the sense of words to construct the feature vector for document representation. Our experimental results demonstrate that in most conditions, using sense can improve the performance of our document clustering system. But the comprehensive statistical analysis performed indicates that the differences between using original single words and using senses of words are not statistically significant. In this paper, we also provide an evaluation of several basic clustering algorithms for algorithm selection.
Keywords
Clustering algorithms; Computer science; Data engineering; Data mining; Databases; Frequency; Information retrieval; Partitioning algorithms; Statistical analysis; Thesauri;
fLanguage
English
Publisher
ieee
Conference_Titel
System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on
ISSN
1530-1605
Print_ISBN
0-7695-2507-5
Type
conf
DOI
10.1109/HICSS.2006.129
Filename
1579400
Link To Document