Title :
Improved information gain-based feature selection for text categorization
Author :
Zhe Gao ; Yajing Xu ; Fanyu Meng ; Feng Qi ; Zhiqing Lin
Author_Institution :
Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
Feature Selection (FS) is one of the most important issues in Text Categorization (TC). Empirical studies show that Information Gain (IG) is an effective method in FS. However, as traditional IG gives little attention to term frequency and takes into account the situation that the term does not appear, the effect is not ideal. In this paper, we put forward an improved information gain-based feature selection method using term frequency information and balance factor(IGTB) for statistical machine learning-based text categorization. Our feature selection method strives to precisely pick out the key feature items on the text corpus. Experiments on Reuters-21578 and WebKB collections show that our method efficiently enhances the categorization accuracy compared with the conventional information gain and other methods.
Keywords :
feature selection; learning (artificial intelligence); statistical analysis; text analysis; FS; IGTB; Reuters-21578 collections; TC; WebKB collections; categorization accuracy; information gain-based feature selection method; key feature items; statistical machine learning-based text categorization; term frequency information and balance factor; text corpus; Accuracy; Algorithm design and analysis; Classification algorithms; Educational institutions; Machine learning algorithms; Text categorization; Time-frequency analysis; Feature Selection; Information Gain; Text Categorization;
Conference_Titel :
Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems (VITAE), 2014 4th International Conference on
Conference_Location :
Aalborg
Print_ISBN :
978-1-4799-4626-6
DOI :
10.1109/VITAE.2014.6934421