• DocumentCode
    636020
  • Title

    Hybrid approach for visualization of documents clusters using GHSOM and sammon projection

  • Author

    Butka, P. ; Pocsova, J.

  • Author_Institution
    Dept. of Cybemetics & Artificial Intell., Tech. Univ. of Kosice, Kosice, Slovakia
  • fYear
    2013
  • fDate
    23-25 May 2013
  • Firstpage
    337
  • Lastpage
    342
  • Abstract
    This paper presents the hybrid approach for visualization of documents sets by the combination of hierarchical clustering method, based on the Growing Hierarchical Self-Organizing Maps algorithm, and Sammon projection. Algorithms based on the self-organizing maps provide robust clustering method suitable for visualization of larger number of documents into the grid-based 2D maps. Sammon projection is nonlinear projection method suitable mostly to visualization of smaller sets of object on (usually 2D) maps based on the projections. Here we have implemented and tested combination of these approaches, where starting set of documents is organized using GHSOM to subsets of similar documents, then for clusters at the end of clustering phase, with smaller number of inputs, Sammon maps are created in order to provide distinction also for documents in these clusters. The method for extraction of characteristic terms based on the information gain analysis was used for description of clusters. Existing library JBOWL was used for implementation of the hybrid algorithm. For testing purposes, the documents in English language were used.
  • Keywords
    data visualisation; document handling; information analysis; natural language processing; pattern clustering; self-organising feature maps; English language; GHSOM; Sammon projection; clustering phase; documents cluster visualization; grid-based 2D maps; growing hierarchical self-organizing maps algorithm; hierarchical clustering method; hybrid algorithm; information gain analysis; library JBOWL; nonlinear projection method; robust clustering method; Adaptation models; Algorithm design and analysis; Clustering algorithms; Data mining; Stress; Vectors; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on
  • Conference_Location
    Timisoara
  • Print_ISBN
    978-1-4673-6397-6
  • Type

    conf

  • DOI
    10.1109/SACI.2013.6608994
  • Filename
    6608994