Improving the k-NN and applying it to Chinese text classification

Author

Yuan, Fang ; Yang, Liu ; Yu, Ge

Author_Institution

Coll. of Math. & Comput. Sci., Hebei Univ., China

Volume

3

fYear

2005

fDate

18-21 Aug. 2005

Firstpage

1547

Abstract

With the problems of applying k-NN to Chinese text classification, this paper gives some improvements on k-NN. Word segmentation based on dictionaries and statistics can increase the accuracy of the classification and reduce the number of dimensions. Applying genetic algorithm to learn the value of k can improve classification automatization. The gradual classification mode is good for improving classification efficiency. The experiment shows that those improvements on k-NN can improve the efficiency of Chinese text classification while maintain the higher accuracy.

Keywords

classification; genetic algorithms; text analysis; Chinese text classification; classification automatization; genetic algorithm; k-nearest neighbor; word segmentation; Computer science; Educational institutions; Electronic mail; Genetic algorithms; Information science; Internet; Mathematics; Statistics; Testing; Text categorization; Chinese text classification; genetic algorithm; gradual classification mode; k-Nearest Neighbor method; text preprocessing;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on

Conference_Location

Guangzhou, China

Print_ISBN

0-7803-9091-1

Type

conf

DOI

10.1109/ICMLC.2005.1527190

Filename

1527190