Title :
Study of text classification methods for data sets with huge features
Author :
Wei, Guiying ; Gao, Xuedong ; Wu, Sen
Author_Institution :
Sch. of Econ. & Manage., Univ. of Sci. & Technol. Beijing, Beijing, China
Abstract :
Text classification has gained booming interest over the past few years. In this paper we look at the main approaches that have been taken towards text classification. The key text classification techniques including text model, feature selection methods and text classification algorithms are discussed. This work focus on the implementation of a text classification system based on Mutual Information and K-Nearest Neighbor algorithm and Support Vector Machine. The experimental results on Reuters collection are also presented. It shows that Mutual Information is a kind of efficient dimension reduction method for text data sets with huge features.
Keywords :
feature extraction; pattern classification; support vector machines; text analysis; K-Nearest Neighbor algorithm; Reuters collection; dimension reduction method; feature selection methods; huge feature data sets; mutual information; support vector machine; text classification algorithms; text model; Indexing; Support vector machines; K-Nearest Neighbor; Mutual Information; Text classification; feature selection;
Conference_Titel :
Industrial and Information Systems (IIS), 2010 2nd International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-7860-6
DOI :
10.1109/INDUSIS.2010.5565817