Title :
Text clustering using a multiset model
Author :
Takumi, Satoshi ; Miyamoto, Sadaaki
Author_Institution :
Master´´s Program in Risk Eng., Univ. of Tsukuba, Tsukuba, Japan
Abstract :
The aim of this paper is to study methods of agglomerative hierarchical clustering which are based on the model of bag of words with text mining applications. In particular, a multiset theoretical model is used and an asymmetric similarity measure is studied in addition to two symmetric similarities. The dendrogram which is the output of hierarchical clustering often has reversals. If we have a reversal, to obtain clusters from the dendrogram becomes difficult. Then, we show the condition that dendrogram have no reversals. It is proved that the proposed methods have no reversals in the dendrograms. Examples based on Twitter and Wikipedia data show how the methods work.
Keywords :
data mining; pattern clustering; pattern matching; set theory; text analysis; Twitter; Wikipedia; agglomerative hierarchical clustering method; asymmetric similarity measure; multiset theoretical model; text clustering; text mining application; Data models; Electronic publishing; Encyclopedias; Internet; Text mining; Twitter; hierarchical clustering; multiset; text mining;
Conference_Titel :
Granular Computing (GrC), 2011 IEEE International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4577-0372-0
DOI :
10.1109/GRC.2011.6122670