Title :
A Novel Text Representation Model for Text Classification
Author :
Wang, Jun ; Zhou, Yiming
Author_Institution :
Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing
Abstract :
The text representation in text classification is usually a sequence of terms. As the number of terms becomes very high, it is greatly time-consuming to perform existed text categorization tasks. In this paper we presented a novel text representation model for text classification which greatly reduced the required resources. This model represents text with several features. Each feature corresponds to a theme that emerged from a set of related articles. We also introduce an efficient way to build the model. The proposed model has been applied to naive bayes classifier and experiments on Reuters-21578 corpus have shown that the efficiency is greatly improved without sacrificing classification accuracy even when the dimension of the input space is significantly reduced.
Keywords :
Bayes methods; classification; text analysis; Reuters-21578 corpus; classification accuracy; naive Bayes classifier; text categorization tasks; text classification; text representation model; Clustering algorithms; Computer science; Indexing; Information retrieval; Intelligent networks; Intelligent systems; Natural language processing; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Intelligent Networks and Intelligent Systems, 2008. ICINIS '08. First International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3391-9
Electronic_ISBN :
978-0-7695-3391-9
DOI :
10.1109/ICINIS.2008.21