Title :
DiscWord: Learning Discriminative Topics
Author :
Yu Jiang ; Xian Li ; Weiyi Meng
Author_Institution :
Dept. of Comput. Sci., Binghamton Univ., Binghamton, NY, USA
Abstract :
Topic modeling is a popular research topic and is widely used in text mining based applications. Many researchers realize that the learned topics in the LDA model, each as a multinomial distribution on the word vocabulary space, are often not intuitive in term of human recognition and communication. Based on our observation, given a topic, the most frequent words in it are usually less important than some words that are dedicated to it. In this paper, aiming at learning discriminative topics, we introduce a measure named word discriminability to capture a word´s ability to identify different topics, and propose an iterative algorithm that is able to train and utilize word discriminability information during the topic learning process. Experimental results show that applying our method on the LDA topic model can improve its document classification accuracy significantly, the learned topics are more discriminative, and the top words of a topic are usually more representative.
Keywords :
iterative methods; learning (artificial intelligence); pattern classification; text analysis; word processing; DiscWord; LDA topic model; discriminative topic learning; document classification accuracy; iterative algorithm; latent Dirichlet allocation; topic learning process; word discriminability information; Accuracy; Computational modeling; Equations; Mathematical model; Measurement; Vectors; Vocabulary; discriminative topic; feature selection; topic model;
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
DOI :
10.1109/WI-IAT.2014.81