Title :
A robust framework for short text categorization based on topic model and integrated classifier
Author :
Peng Wang ; Heng Zhang ; Yu-Fang Wu ; Bo Xu ; Hong-Wei Hao
Author_Institution :
Inst. of Autom., Beijing, China
Abstract :
In this paper, we propose a method for short text categorization using topic model and integrated classifier. To enrich the representation of short text, the Latent Dirichlet Allocation (LDA) model is used to extract latent topic information. While for classification, we combine two classifiers for achieving high reliability. Particularly, we train LDA models with variable number of topics using the Wikipedia corpus as external knowledge base, and extend labeled Web snippets by potential topics extracted by LDA. Then, the enriched representation of snippets are used to learn Maximum Entropy (MaxEnt) and support vector machine (SVM) classifiers separately. Finally, viewing that the most possible predicted result will appear in the top two candidates selected by MaxEnt classifier, we develop a novel scheme that if the gap between these candidates is large enough, the predicted result is considered to be reliable; otherwise, the SVM classifier will be integrated with MaxEnt classifier to make a comprehensive prediction. Experimental results show that our framework is effective and can outperform the state-of-the-art techniques.
Keywords :
Internet; learning (artificial intelligence); pattern classification; support vector machines; text analysis; LDA model; MaxEnt classifier; SVM classifier; Web snippets; Wikipedia corpus; integrated classifier; latent Dirichlet allocation model; latent topic information extraction; maximum entropy; short text categorization; short text representation; support vector machine; topic model; Electronic publishing; Encyclopedias; Internet; Semantics; Support vector machines; Text categorization;
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
DOI :
10.1109/IJCNN.2014.6889589