DocumentCode :
2253350
Title :
Modeling texts in semantic space and ensemble topic-models via boosting strategy
Author :
Yongliang, Wang ; Qiao, Guo
Author_Institution :
School of Automation, Beijing Institute of Technology, Beijing 100081
fYear :
2015
fDate :
28-30 July 2015
Firstpage :
3838
Lastpage :
3843
Abstract :
Topic model, especially Latent Dirichlet Allocation, is a probabilistic graphic model for feature selection and dimension reduction in text categorization tasks. With the mapping from word space to the latent topic space, there are more benefits, but at the same time, the inference and estimate of the model parameters become a new trouble. This article first improved the traditional LDA through introducing an ontology called CILIN, and then we constructed a novel classification algorithm that combined different models with different parameters together via boosting strategy. Moreover, Naïve Bayes and Support Vector Machine are employed as weak classifier and a weighted method is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more ration way. Experiment results show our method well perform both in accuracy and generalization.
Keywords :
Accuracy; Boosting; Hidden Markov models; Measurement; Resource management; Semantics; Training; Boosting; CILIN; Latent Dirichlet Allocation; Topic Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control Conference (CCC), 2015 34th Chinese
Conference_Location :
Hangzhou, China
Type :
conf
DOI :
10.1109/ChiCC.2015.7260231
Filename :
7260231
Link To Document :
بازگشت