Title :
Automatic text categorization using NTC
Author_Institution :
Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
Abstract :
In this research, we propose NTC (Neural Text Categorizer) as the approach to text categorization. Traditional approaches to text categorization require encoding documents into numerical vectors which leads to the two main problems: huge dimensionality and sparse distribution in each numerical vector. In this research, documents are encoded into string vectors instead of numerical vectors, and a new neural network called NTC which receive a string vector as its input vector is used for text categorization. The goal of this research is to avoid the two main problems by encoding documents into alternative structured data to numerical vectors. We will validate the performance of NTC by comparing it with other machine learning algorithms on the standard test bed, Reuter 21578.
Keywords :
classification; neural nets; text analysis; huge dimensionality distribution; machine learning algorithm; neural text categorizer; sparse distribution; Classification algorithms; Costs; Encoding; Frequency; Indexing; Machine learning algorithms; Neural networks; Robustness; Testing; Text categorization;
Conference_Titel :
Networked Digital Technologies, 2009. NDT '09. First International Conference on
Conference_Location :
Ostrava
Print_ISBN :
978-1-4244-4614-8
Electronic_ISBN :
978-1-4244-4615-5
DOI :
10.1109/NDT.2009.5272193