DocumentCode :
3533136
Title :
Automatic text categorization using NTC
Author :
Jo, Taeho
Author_Institution :
Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
fYear :
2009
fDate :
28-31 July 2009
Firstpage :
26
Lastpage :
31
Abstract :
In this research, we propose NTC (Neural Text Categorizer) as the approach to text categorization. Traditional approaches to text categorization require encoding documents into numerical vectors which leads to the two main problems: huge dimensionality and sparse distribution in each numerical vector. In this research, documents are encoded into string vectors instead of numerical vectors, and a new neural network called NTC which receive a string vector as its input vector is used for text categorization. The goal of this research is to avoid the two main problems by encoding documents into alternative structured data to numerical vectors. We will validate the performance of NTC by comparing it with other machine learning algorithms on the standard test bed, Reuter 21578.
Keywords :
classification; neural nets; text analysis; huge dimensionality distribution; machine learning algorithm; neural text categorizer; sparse distribution; Classification algorithms; Costs; Encoding; Frequency; Indexing; Machine learning algorithms; Neural networks; Robustness; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networked Digital Technologies, 2009. NDT '09. First International Conference on
Conference_Location :
Ostrava
Print_ISBN :
978-1-4244-4614-8
Electronic_ISBN :
978-1-4244-4615-5
Type :
conf
DOI :
10.1109/NDT.2009.5272193
Filename :
5272193
Link To Document :
بازگشت