Title :
Using semantic similarity matrix for defining operations involved in NTSO for clustering 20NewsGroups
Author_Institution :
Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
Abstract :
In this research, we propose the similarity matrix based version of NTSO as the approach to the text clustering. For using one of traditional approaches to text clustering, documents should be encoded into numerical vectors; encoding so causes the two main problems: the huge dimensionality and the sparse distribution. In order to solve the problems, in this research, we propose to encode documents into string vectors and use the NTSO (Neural Text Self Organization) as the string vector based neural network for the text clustering. By encoding documents into another form, we attempt to avoid the two main problems, completely. As the empirical validation, the proposed approach will be compared with others with respect to the clustering performance and speed.
Keywords :
matrix algebra; neural nets; pattern clustering; text analysis; vectors; 20NewsGroups; NTSO; neural network; neural text self organization; numerical vector; semantic similarity matrix; string vector; text clustering; Artificial neural networks; Clustering algorithms; Encoding; Finite element methods; Semantics; Text categorization; Training;
Conference_Titel :
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6909-3
DOI :
10.1109/CEC.2010.5586335