Title :
Hybrid neural document clustering using guided self-organization and WordNet
Author :
Hung, Chihli ; Wermter, Stefan ; Smith, Peter
Author_Institution :
Hybrid Intelligent Syst., Univ. of Sunderland, UK
Abstract :
Document clustering is text processing that groups documents with similar concepts. It´s usually considered an unsupervised learning approach because there´s no teacher to guide the training process, and topical information is often assumed to be unavailable. A guided approach to document clustering that integrates linguistic top-down knowledge from WordNet into text vector representations based on the extended significance vector weighting technique improves both classification accuracy and average quantization error. In our guided self-organization approach we integrate topical and semantic information from WordNet. Because a document-training set with preclassified information implies relationships between a word and its preference class, we propose a novel document vector representation approach to extract these relationships for document clustering. Furthermore, merging statistical methods, competitive neural models, and semantic relationships from symbolic Word-Net, our hybrid learning approach is robust and scales up to a real-world task of clustering 100,000 news documents.
Keywords :
Internet; document handling; learning (artificial intelligence); pattern clustering; self-organising feature maps; WordNet; average quantization error; competitive neural model; document-training set; guided self-organization; hybrid neural document clustering; linguistic top-down knowledge; novel document vector representation; preference class; text vector representation; unsupervised learning; vector weighting technique; Data mining; Entropy; Humans; Merging; Neural networks; Supervised learning; Testing; Text processing; Unsupervised learning; Vector quantization;
Journal_Title :
Intelligent Systems, IEEE
DOI :
10.1109/MIS.2004.1274914