• DocumentCode
    2771232
  • Title

    A sequence based dynamic SOM model for text clustering

  • Author

    Gunasinghe, Upuli ; Matharage, Sumith ; Alahakoon, Damminda

  • Author_Institution
    Fac. of IT, Monash Univ., Melbourne, VIC, Australia
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Text clustering can be considered as a four step process consisting of feature extraction, text representation, document clustering and cluster interpretation. Most text clustering models consider text as an unordered collection of words. However the semantics of text would be better captured if word sequences are taken into account. In this paper we propose a sequence based text clustering model where four novel sequence based components are introduced in each of the four steps in the text clustering process. Experiments conducted on the Reuters dataset and Sydney Morning Herald (SMH) news archives demonstrate the advantage of the proposed sequence based model, in terms of capturing context with semantics, accuracy and speed, compared to clustering of documents based on single words and n-gram based models.
  • Keywords
    feature extraction; pattern clustering; self-organising feature maps; text analysis; Reuters dataset; Sydney Morning Herald news archives; cluster interpretation; document clustering; feature extraction; sequence based dynamic SOM model; text clustering process; text representation; Adaptation models; Clustering algorithms; Equations; Feature extraction; Indexes; Mathematical model; Semantics; Growing Self Organizing Map; Semantics; Sequence learning; Text clustering; Text feature selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2012 International Joint Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-1488-6
  • Electronic_ISBN
    2161-4393
  • Type

    conf

  • DOI
    10.1109/IJCNN.2012.6252474
  • Filename
    6252474