DocumentCode :
3124486
Title :
Sparse Topic Model for text classification
Author :
Tao Liu
Author_Institution :
Sch. of Inf., Renmin Univ. of China, Beijing, China
Volume :
04
fYear :
2013
fDate :
14-17 July 2013
Firstpage :
1916
Lastpage :
1920
Abstract :
This paper addresses a new text classification method: Sparse Topic Model, which represents documents by the sparse coding of topics. Topics contain more semantic information than words, so it´s more effective for feature representation of documents. Topics are extracted from documents by LDA in an unsupervised way. Based on these topics, sparse coding is applied to discover more high-level representation. We compare the Sparse Topic Model with the traditional methods, such as SVM, and the experimental result show that the proposed method achieves better performance, especially when the number of training examples is limited. The effect of topic number and word number per topic on the performance is also investigated. Due to the unsupervised characteristic of Sparse Topic Model, it´s very useful for real application.
Keywords :
pattern classification; text analysis; text detection; unsupervised learning; LDA; SVM; document feature representation; high-level representation; latent Dirichlet allocation; semantic information; sparse coding; sparse topic model; text classification method; topic number effect; topics extraction; unsupervised characteristic; word number per topic; Abstracts; Petroleum; Semantics; Support vector machines; Text categorization; Sparse coding; Text classification; Topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
Type :
conf
DOI :
10.1109/ICMLC.2013.6890908
Filename :
6890908
Link To Document :
بازگشت