• DocumentCode
    424030
  • Title

    Integrating phrases to enhance HSOMART-based document clustering

  • Author

    Hussin, Mahmoud F. ; Kamel, Mohamed S.

  • Author_Institution
    Dept. of Comput. Sci. & Autom. Control, Alexandria Univ., Egypt
  • Volume
    3
  • fYear
    2004
  • fDate
    25-29 July 2004
  • Firstpage
    2347
  • Abstract
    Document clustering is one of the popular techniques that assist users in organizing collections of documents. Two successful models of unsupervised neural networks, self-organizing map (SOM) and adaptive resonance theory (ART), have shown promising results in this task. Most of the existing neural network based document clustering techniques rely on a "bag of words" document representation. Each word in the document is considered as a separate feature, ignoring the word order. We investigate the use of phrases rather than words as document features applied to our proposed document clustering technique, called hierarchical SOMART (HSOMART), which is a hierarchical network built up from independent SOM and ART neural networks. We describe a phrase grammar extraction technique, and the proposed HSOMART. The experimental results of clustering documents from the REUTERS corpus using the extracted phrases as features show an improvement in the clustering performance evaluated using the entropy and F-measure.
  • Keywords
    ART neural nets; document handling; entropy; pattern clustering; self-organising feature maps; tree data structures; unsupervised learning; ART; F-measure; REUTERS corpus; adaptive resonance theory; bag of words; document clustering techniques; document representation; entropy; feature extraction; hierarchical network; phrase grammar extraction technique; phrase integration; self organizing map; unsupervised neural networks; Automatic control; Clustering algorithms; Computer science; Feature extraction; Neural networks; Organizing; Resonance; Subspace constraints; Text categorization; Tree graphs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-8359-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2004.1380993
  • Filename
    1380993