DocumentCode :
3335327
Title :
The research of an improved information gain method using distribution information of terms
Author :
Yang Yu-zhen ; Liu Pei-yu ; Zhu Zhen-fang ; Qiu Ye
Author_Institution :
Dept. of Inf. Sci. & Eng., Shandong Normal Univ., Ji´nan, China
Volume :
1
fYear :
2009
fDate :
14-16 Aug. 2009
Firstpage :
938
Lastpage :
941
Abstract :
The inadequacy of the information gain is taken into account the situation that the term does not appear. But, in this paper, by analyzing the distribution information of terms, we find if the value of distribution information inside a class of the term becomes large, the distribution of the term inclines to imbalance, and if there is high imbalance of the term, the distribution information among classes will tend to a smaller value. Therefore, the distribution information inside a class and distribution information among classes are introduced to this paper to reduce the effect of the term does not appear, and improve the traditional information gain. After experimental verification, the improved algorithm (GDI) has a better performance than traditional feature selection algorithm in some fields, such as the information gain.
Keywords :
information retrieval; text analysis; distribution information of terms; experimental verification; improved information gain method; Chaos; Entropy; Frequency; Gain measurement; Information analysis; Information science; Mutual information; Performance gain; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
IT in Medicine & Education, 2009. ITIME '09. IEEE International Symposium on
Conference_Location :
Jinan
Print_ISBN :
978-1-4244-3928-7
Electronic_ISBN :
978-1-4244-3930-0
Type :
conf
DOI :
10.1109/ITIME.2009.5236210
Filename :
5236210
Link To Document :
بازگشت