• DocumentCode
    928897
  • Title

    Hybrid neural document clustering using guided self-organization and WordNet

  • Author

    Hung, Chihli ; Wermter, Stefan ; Smith, Peter

  • Author_Institution
    Hybrid Intelligent Syst., Univ. of Sunderland, UK
  • Volume
    19
  • Issue
    2
  • fYear
    2004
  • Firstpage
    68
  • Lastpage
    77
  • Abstract
    Document clustering is text processing that groups documents with similar concepts. It´s usually considered an unsupervised learning approach because there´s no teacher to guide the training process, and topical information is often assumed to be unavailable. A guided approach to document clustering that integrates linguistic top-down knowledge from WordNet into text vector representations based on the extended significance vector weighting technique improves both classification accuracy and average quantization error. In our guided self-organization approach we integrate topical and semantic information from WordNet. Because a document-training set with preclassified information implies relationships between a word and its preference class, we propose a novel document vector representation approach to extract these relationships for document clustering. Furthermore, merging statistical methods, competitive neural models, and semantic relationships from symbolic Word-Net, our hybrid learning approach is robust and scales up to a real-world task of clustering 100,000 news documents.
  • Keywords
    Internet; document handling; learning (artificial intelligence); pattern clustering; self-organising feature maps; WordNet; average quantization error; competitive neural model; document-training set; guided self-organization; hybrid neural document clustering; linguistic top-down knowledge; novel document vector representation; preference class; text vector representation; unsupervised learning; vector weighting technique; Data mining; Entropy; Humans; Merging; Neural networks; Supervised learning; Testing; Text processing; Unsupervised learning; Vector quantization;
  • fLanguage
    English
  • Journal_Title
    Intelligent Systems, IEEE
  • Publisher
    ieee
  • ISSN
    1541-1672
  • Type

    jour

  • DOI
    10.1109/MIS.2004.1274914
  • Filename
    1274914