Title :
Feature overlap-based dynamic self organizing model for hierarchical text clustering
Author :
Nathawitharana, Nilupulee ; Matharage, Sumith ; Alahakoon, D.
Author_Institution :
Sch. of Inf. & Bus. Analytics, Deakin Univ., Melbourne, VIC, Australia
Abstract :
In text document clustering documents are represented as feature vectors where features can be either words or phrases. Documents can belong to different topics when categorized by humans; however it is noted that obtaining one to one mapping between the features and the topics is almost impossible since the same features can and will be used in documents in different topics. Such common features results in overlap in text clustering, and as such traditional cluster purity measures may not be practical or meaningful. In this paper new methodology and algorithm is introduced which considers the feature overlap between the clusters when clustering text documents. Hierarchical clustering facilitated by the Growing Self-Organizing Map (GSOM) is used together with the calculated feature overlap to check the possibility of obtaining clusters with minimum feature overlap. We also present the experimental results obtained from following the proposed methodology with the new algorithm.
Keywords :
pattern clustering; self-organising feature maps; text analysis; GSOM; dynamic self organizing model; feature overlap; feature vectors; growing self-organizing map; hierarchical text clustering; text document clustering; Algorithm design and analysis; Clustering algorithms; Conferences; Feature extraction; Indexes; Vectors; Feature overlap; Growing Self-Organizing Map; Hierarchical clustering; Text document clustering;
Conference_Titel :
Industrial and Information Systems (ICIIS), 2013 8th IEEE International Conference on
Conference_Location :
Peradeniya
Print_ISBN :
978-1-4799-0908-7
DOI :
10.1109/ICIInfS.2013.6732016