• DocumentCode
    3533136
  • Title

    Automatic text categorization using NTC

  • Author

    Jo, Taeho

  • Author_Institution
    Sch. of Comput. & Inf. Eng., Inha Univ., Incheon, South Korea
  • fYear
    2009
  • fDate
    28-31 July 2009
  • Firstpage
    26
  • Lastpage
    31
  • Abstract
    In this research, we propose NTC (Neural Text Categorizer) as the approach to text categorization. Traditional approaches to text categorization require encoding documents into numerical vectors which leads to the two main problems: huge dimensionality and sparse distribution in each numerical vector. In this research, documents are encoded into string vectors instead of numerical vectors, and a new neural network called NTC which receive a string vector as its input vector is used for text categorization. The goal of this research is to avoid the two main problems by encoding documents into alternative structured data to numerical vectors. We will validate the performance of NTC by comparing it with other machine learning algorithms on the standard test bed, Reuter 21578.
  • Keywords
    classification; neural nets; text analysis; huge dimensionality distribution; machine learning algorithm; neural text categorizer; sparse distribution; Classification algorithms; Costs; Encoding; Frequency; Indexing; Machine learning algorithms; Neural networks; Robustness; Testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networked Digital Technologies, 2009. NDT '09. First International Conference on
  • Conference_Location
    Ostrava
  • Print_ISBN
    978-1-4244-4614-8
  • Electronic_ISBN
    978-1-4244-4615-5
  • Type

    conf

  • DOI
    10.1109/NDT.2009.5272193
  • Filename
    5272193