DocumentCode :
424030
Title :
Integrating phrases to enhance HSOMART-based document clustering
Author :
Hussin, Mahmoud F. ; Kamel, Mohamed S.
Author_Institution :
Dept. of Comput. Sci. & Autom. Control, Alexandria Univ., Egypt
Volume :
3
fYear :
2004
fDate :
25-29 July 2004
Firstpage :
2347
Abstract :
Document clustering is one of the popular techniques that assist users in organizing collections of documents. Two successful models of unsupervised neural networks, self-organizing map (SOM) and adaptive resonance theory (ART), have shown promising results in this task. Most of the existing neural network based document clustering techniques rely on a "bag of words" document representation. Each word in the document is considered as a separate feature, ignoring the word order. We investigate the use of phrases rather than words as document features applied to our proposed document clustering technique, called hierarchical SOMART (HSOMART), which is a hierarchical network built up from independent SOM and ART neural networks. We describe a phrase grammar extraction technique, and the proposed HSOMART. The experimental results of clustering documents from the REUTERS corpus using the extracted phrases as features show an improvement in the clustering performance evaluated using the entropy and F-measure.
Keywords :
ART neural nets; document handling; entropy; pattern clustering; self-organising feature maps; tree data structures; unsupervised learning; ART; F-measure; REUTERS corpus; adaptive resonance theory; bag of words; document clustering techniques; document representation; entropy; feature extraction; hierarchical network; phrase grammar extraction technique; phrase integration; self organizing map; unsupervised neural networks; Automatic control; Clustering algorithms; Computer science; Feature extraction; Neural networks; Organizing; Resonance; Subspace constraints; Text categorization; Tree graphs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on
ISSN :
1098-7576
Print_ISBN :
0-7803-8359-1
Type :
conf
DOI :
10.1109/IJCNN.2004.1380993
Filename :
1380993
Link To Document :
بازگشت