Study of text classification methods for data sets with huge features

Author

Wei, Guiying ; Gao, Xuedong ; Wu, Sen

Author_Institution

Sch. of Econ. & Manage., Univ. of Sci. & Technol. Beijing, Beijing, China

Volume

1

fYear

2010

fDate

10-11 July 2010

Firstpage

433

Lastpage

436

Abstract

Text classification has gained booming interest over the past few years. In this paper we look at the main approaches that have been taken towards text classification. The key text classification techniques including text model, feature selection methods and text classification algorithms are discussed. This work focus on the implementation of a text classification system based on Mutual Information and K-Nearest Neighbor algorithm and Support Vector Machine. The experimental results on Reuters collection are also presented. It shows that Mutual Information is a kind of efficient dimension reduction method for text data sets with huge features.

Keywords

feature extraction; pattern classification; support vector machines; text analysis; K-Nearest Neighbor algorithm; Reuters collection; dimension reduction method; feature selection methods; huge feature data sets; mutual information; support vector machine; text classification algorithms; text model; Indexing; Support vector machines; K-Nearest Neighbor; Mutual Information; Text classification; feature selection;

fLanguage

English

Publisher

ieee

Conference_Titel

Industrial and Information Systems (IIS), 2010 2nd International Conference on

Conference_Location

Dalian

Print_ISBN

978-1-4244-7860-6

Type

conf

DOI

10.1109/INDUSIS.2010.5565817

Filename

5565817