DocumentCode
3533136
Title
Automatic text categorization using NTC
Author
Jo, Taeho
Author_Institution
Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
fYear
2009
fDate
28-31 July 2009
Firstpage
26
Lastpage
31
Abstract
In this research, we propose NTC (Neural Text Categorizer) as the approach to text categorization. Traditional approaches to text categorization require encoding documents into numerical vectors which leads to the two main problems: huge dimensionality and sparse distribution in each numerical vector. In this research, documents are encoded into string vectors instead of numerical vectors, and a new neural network called NTC which receive a string vector as its input vector is used for text categorization. The goal of this research is to avoid the two main problems by encoding documents into alternative structured data to numerical vectors. We will validate the performance of NTC by comparing it with other machine learning algorithms on the standard test bed, Reuter 21578.
Keywords
classification; neural nets; text analysis; huge dimensionality distribution; machine learning algorithm; neural text categorizer; sparse distribution; Classification algorithms; Costs; Encoding; Frequency; Indexing; Machine learning algorithms; Neural networks; Robustness; Testing; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Networked Digital Technologies, 2009. NDT '09. First International Conference on
Conference_Location
Ostrava
Print_ISBN
978-1-4244-4614-8
Electronic_ISBN
978-1-4244-4615-5
Type
conf
DOI
10.1109/NDT.2009.5272193
Filename
5272193
Link To Document