مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2924954

Title :

Text clustering using a multiset model

Author :

Takumi, Satoshi ; Miyamoto, Sadaaki

Author_Institution :

Master´´s Program in Risk Eng., Univ. of Tsukuba, Tsukuba, Japan

fYear :

2011

fDate :

8-10 Nov. 2011

Firstpage :

630

Lastpage :

635

Abstract :

The aim of this paper is to study methods of agglomerative hierarchical clustering which are based on the model of bag of words with text mining applications. In particular, a multiset theoretical model is used and an asymmetric similarity measure is studied in addition to two symmetric similarities. The dendrogram which is the output of hierarchical clustering often has reversals. If we have a reversal, to obtain clusters from the dendrogram becomes difficult. Then, we show the condition that dendrogram have no reversals. It is proved that the proposed methods have no reversals in the dendrograms. Examples based on Twitter and Wikipedia data show how the methods work.

Keywords :

data mining; pattern clustering; pattern matching; set theory; text analysis; Twitter; Wikipedia; agglomerative hierarchical clustering method; asymmetric similarity measure; multiset theoretical model; text clustering; text mining application; Data models; Electronic publishing; Encyclopedias; Internet; Text mining; Twitter; hierarchical clustering; multiset; text mining;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Granular Computing (GrC), 2011 IEEE International Conference on

Conference_Location :

Kaohsiung

Print_ISBN :

978-1-4577-0372-0

Type :

conf

DOI :

10.1109/GRC.2011.6122670

Filename :

6122670

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2924954