DocumentCode
3428339
Title
A Clustering Algorithm for Short Documents Based On Concept Similarity
Author
Peng, Jing ; Yang, Dong-Qing ; Wang, Jian-Wei ; Wu, Meng-Qing ; Wang, Jun-Gang
Author_Institution
Peking Univ., Beijing
fYear
2007
fDate
22-24 Aug. 2007
Firstpage
42
Lastpage
45
Abstract
In recent years, there has been an increasing interest in data clustering of short documents. Existing works consider seldom the concept similarity between the words, so the quality of clustering is often very low. This paper proposes a new document-clustering algorithm based on concept similarity in Chinese text processing. Different from tradition method, the algorithm converts text into a words vector space model at first; it splits words into a set of concepts at second; 3rd, it gets the similarity between words through computing the inner products between concepts; 4th, it computes the similarity of text based on the similarity of words. Finally, through two-phased steps, the algorithm finishes the clustering of a specified set of document. The extensive experiments prove the validity and performance of the algorithm.
Keywords
natural languages; pattern clustering; text analysis; Chinese text processing; concept similarity; data clustering; short document clustering algorithm; Books; Clustering algorithms; Computer science; Data engineering; Data mining; Natural languages; Research and development; Text processing; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Communications, Computers and Signal Processing, 2007. PacRim 2007. IEEE Pacific Rim Conference on
Conference_Location
Victoria, BC
Print_ISBN
978-1-4244-1189-4
Electronic_ISBN
1-4244-1190-4
Type
conf
DOI
10.1109/PACRIM.2007.4313172
Filename
4313172
Link To Document