DocumentCode
145131
Title
An improved LDA algorithm for text classification
Author
Dexin Zhao ; Jinqun He ; Jin Liu
Author_Institution
Tianjin Key Lab. of Intell. Comput. & Novel Software Technol., Tianjin Univ. of Technol., Tianjin, China
Volume
1
fYear
2014
fDate
26-28 April 2014
Firstpage
217
Lastpage
221
Abstract
Latent Dirichlet Allocation is a classic topic model which can extract latent topic from large data corpus. This model assumes that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. In this paper, we present an algorithm called gLDA for topic text classification by adding topic-category distribution parameter to LDA, which can make the document generated from the most relevant category. Gibbs sampling is employed to conduct approximate inference, and experiment results in two datasets show the effectiveness of this method.
Keywords
pattern classification; sampling methods; text analysis; Gibbs sampling; LDA algorithm; approximate inference; data corpus; gLDA; latent Dirichlet allocation; topic text classification; topic-category distribution parameter; Accuracy; Data models; Predictive models; Resource management; Text categorization; Training; Training data; LDA; text classification; topic model;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on
Conference_Location
Sapporo
Print_ISBN
978-1-4799-3196-5
Type
conf
DOI
10.1109/InfoSEEE.2014.6948100
Filename
6948100
Link To Document