Title :
Feature selection for Chinese Text Categorization based on improved particle swarm optimization
Author :
Jin, Yaohong ; Xiong, Wen ; Wang, Cong
Author_Institution :
Inst. of Chinese Inf. Process., Beijing Normal Univ., Beijing, China
Abstract :
Feature selection is an important preprocessing step of Chinese Text Categorization, which reduces the high dimension and keeps the reduced results comprehensible compared to feature extraction. A novel criterion to filter features coarsely is proposed, which integrating the superiorities of term frequency-inverse document frequency as inner-class measure and CHI-square as inter-class, and a new feature selection method for Chinese text categorization based on swarm intelligence is presented, which using improved particle swarm optimization to select features fine on the results of coarse grain filtering, and utilizing support vector machine to evaluate feature subsets and taking the evaluations as the fitness of particles. The experiments on Fudan University Chinese Text Classification Corpus show a higher classification accuracy obtained by using the new criterion for features filtering and an effective feature reduction ratio attained by utilizing the novel FS method for Chinese text categorization.
Keywords :
document handling; information retrieval; natural language processing; particle swarm optimisation; support vector machines; text analysis; CHI-square; Chinese text categorization; coarse grain filtering; feature extraction; feature selection method; frequency-inverse document frequency; particle swarm optimization; support vector machine; swarm intelligence; Art; Computers; Educational institutions; Military computing; Support vector machines; Feature selection; particle swarm optimization; support vector machine; text categorization; text mining;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587844