Title :
Text clustering with NTSO (neural text self organizer)
Author :
Jo, Taeho ; Japkowicz, Nathalie
Author_Institution :
Sch. of Inf. Technol. & Eng., Ottawa Univ., Ont., Canada
fDate :
31 July-4 Aug. 2005
Abstract :
Text clustering is the process of segmenting a particular collection of texts into subgroups including content-based similar ones. This study proposes a new neural network, called NTSO (neural text self organizer), which is suitable for text clustering. This neural network uses string vectors instead of numerical vectors as its input vectors and its weight vectors are different from those of other unsupervised neural networks such as Kohonen networks and ART (adaptive resonance theory), although it is similar to Kohonen networks at the architecture level and in its learning process. Intuitively, text is better represented by a string vector than by a numerical vector. The representation of texts into numerical vectors leads to two main problems: sparse distribution and huge dimensionality of the feature vectors. This study proposes an unsupervised neural network that uses string vectors for text clustering, to address these problems.
Keywords :
feature extraction; neural nets; pattern clustering; text analysis; Kohonen network architecture level; Kohonen network learning process; NTSO; content similar text; feature vectors; neural network; neural text self organizer; sparse distribution; string vectors; text clustering; text segmenting; Adaptive systems; Distributed computing; Information systems; Information technology; Neural networks; Organizing; Resonance; Resource management; Subspace constraints; Text categorization;
Conference_Titel :
Neural Networks, 2005. IJCNN '05. Proceedings. 2005 IEEE International Joint Conference on
Print_ISBN :
0-7803-9048-2
DOI :
10.1109/IJCNN.2005.1555892