DocumentCode
928897
Title
Hybrid neural document clustering using guided self-organization and WordNet
Author
Hung, Chihli ; Wermter, Stefan ; Smith, Peter
Author_Institution
Hybrid Intelligent Syst., Univ. of Sunderland, UK
Volume
19
Issue
2
fYear
2004
Firstpage
68
Lastpage
77
Abstract
Document clustering is text processing that groups documents with similar concepts. It´s usually considered an unsupervised learning approach because there´s no teacher to guide the training process, and topical information is often assumed to be unavailable. A guided approach to document clustering that integrates linguistic top-down knowledge from WordNet into text vector representations based on the extended significance vector weighting technique improves both classification accuracy and average quantization error. In our guided self-organization approach we integrate topical and semantic information from WordNet. Because a document-training set with preclassified information implies relationships between a word and its preference class, we propose a novel document vector representation approach to extract these relationships for document clustering. Furthermore, merging statistical methods, competitive neural models, and semantic relationships from symbolic Word-Net, our hybrid learning approach is robust and scales up to a real-world task of clustering 100,000 news documents.
Keywords
Internet; document handling; learning (artificial intelligence); pattern clustering; self-organising feature maps; WordNet; average quantization error; competitive neural model; document-training set; guided self-organization; hybrid neural document clustering; linguistic top-down knowledge; novel document vector representation; preference class; text vector representation; unsupervised learning; vector weighting technique; Data mining; Entropy; Humans; Merging; Neural networks; Supervised learning; Testing; Text processing; Unsupervised learning; Vector quantization;
fLanguage
English
Journal_Title
Intelligent Systems, IEEE
Publisher
ieee
ISSN
1541-1672
Type
jour
DOI
10.1109/MIS.2004.1274914
Filename
1274914
Link To Document