• DocumentCode
    3186806
  • Title

    Feature overlap-based dynamic self organizing model for hierarchical text clustering

  • Author

    Nathawitharana, Nilupulee ; Matharage, Sumith ; Alahakoon, D.

  • Author_Institution
    Sch. of Inf. & Bus. Analytics, Deakin Univ., Melbourne, VIC, Australia
  • fYear
    2013
  • fDate
    17-20 Dec. 2013
  • Firstpage
    393
  • Lastpage
    398
  • Abstract
    In text document clustering documents are represented as feature vectors where features can be either words or phrases. Documents can belong to different topics when categorized by humans; however it is noted that obtaining one to one mapping between the features and the topics is almost impossible since the same features can and will be used in documents in different topics. Such common features results in overlap in text clustering, and as such traditional cluster purity measures may not be practical or meaningful. In this paper new methodology and algorithm is introduced which considers the feature overlap between the clusters when clustering text documents. Hierarchical clustering facilitated by the Growing Self-Organizing Map (GSOM) is used together with the calculated feature overlap to check the possibility of obtaining clusters with minimum feature overlap. We also present the experimental results obtained from following the proposed methodology with the new algorithm.
  • Keywords
    pattern clustering; self-organising feature maps; text analysis; GSOM; dynamic self organizing model; feature overlap; feature vectors; growing self-organizing map; hierarchical text clustering; text document clustering; Algorithm design and analysis; Clustering algorithms; Conferences; Feature extraction; Indexes; Vectors; Feature overlap; Growing Self-Organizing Map; Hierarchical clustering; Text document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial and Information Systems (ICIIS), 2013 8th IEEE International Conference on
  • Conference_Location
    Peradeniya
  • Print_ISBN
    978-1-4799-0908-7
  • Type

    conf

  • DOI
    10.1109/ICIInfS.2013.6732016
  • Filename
    6732016