DocumentCode
1797724
Title
A robust framework for short text categorization based on topic model and integrated classifier
Author
Peng Wang ; Heng Zhang ; Yu-Fang Wu ; Bo Xu ; Hong-Wei Hao
Author_Institution
Inst. of Autom., Beijing, China
fYear
2014
fDate
6-11 July 2014
Firstpage
3534
Lastpage
3539
Abstract
In this paper, we propose a method for short text categorization using topic model and integrated classifier. To enrich the representation of short text, the Latent Dirichlet Allocation (LDA) model is used to extract latent topic information. While for classification, we combine two classifiers for achieving high reliability. Particularly, we train LDA models with variable number of topics using the Wikipedia corpus as external knowledge base, and extend labeled Web snippets by potential topics extracted by LDA. Then, the enriched representation of snippets are used to learn Maximum Entropy (MaxEnt) and support vector machine (SVM) classifiers separately. Finally, viewing that the most possible predicted result will appear in the top two candidates selected by MaxEnt classifier, we develop a novel scheme that if the gap between these candidates is large enough, the predicted result is considered to be reliable; otherwise, the SVM classifier will be integrated with MaxEnt classifier to make a comprehensive prediction. Experimental results show that our framework is effective and can outperform the state-of-the-art techniques.
Keywords
Internet; learning (artificial intelligence); pattern classification; support vector machines; text analysis; LDA model; MaxEnt classifier; SVM classifier; Web snippets; Wikipedia corpus; integrated classifier; latent Dirichlet allocation model; latent topic information extraction; maximum entropy; short text categorization; short text representation; support vector machine; topic model; Electronic publishing; Encyclopedias; Internet; Semantics; Support vector machines; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4799-6627-1
Type
conf
DOI
10.1109/IJCNN.2014.6889589
Filename
6889589
Link To Document