DocumentCode :
186044
Title :
Application of knowledge gain on multi-type feature space in microblog user classification
Author :
Xu Yan
Author_Institution :
Inst. of Comput. Technol., Beijing Language & Culture Univ., Beijing, China
fYear :
2014
fDate :
22-24 Oct. 2014
Firstpage :
340
Lastpage :
345
Abstract :
Feature selection plays an important role in text categorization. Classic feature selection methods such as document frequency (DF), information gain (IG), mutual information (MI) are commonly applied in text categorization. But usually they only take plain text into account. Knowledge Gain (KG) is a new feature selection method which is proposed in my previous paper. It measures attribute´s importance based on Rough Set theory. Experiment shows that it performs well in traditional text classification, and it has obvious advantage in unbalanced corpus in recall rate. Unlike traditional text classification, characteristics of microblog reflected in short text and special structure networks, including user social network and behavior network. This results in less text information and more behavior and social information of microblog users. The classic feature selection algorithms, which are proposed based on text feature, is not applicable. In this paper, we validated that KG which is proposed based on the rough set knowledge can select optimal feature consistently in multi-type feature space of microblog user classification. Experiment shows that it has better performance in multi-type feature selection than other classic feature selection methods.
Keywords :
feature selection; pattern classification; rough set theory; social networking (online); text analysis; attribute importance; behavior network; document frequency; feature selection algorithm; feature selection method; information gain; knowledge gain; microblog characteristics; microblog user classification; multitype feature space; mutual information; plain text; recall rate; rough set knowledge; rough set theory; short text; social information; special structure network; text categorization; text classification; text feature; text information; user social network; Blogs; Classification algorithms; Entertainment industry; Feature extraction; Set theory; Social network services; Text categorization; feature selection; knowledge gain; microblog; text classification; user classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing (GrC), 2014 IEEE International Conference on
Conference_Location :
Noboribetsu
Type :
conf
DOI :
10.1109/GRC.2014.6982861
Filename :
6982861
Link To Document :
بازگشت