Title :
Sparse Topic Model for text classification
Author_Institution :
Sch. of Inf., Renmin Univ. of China, Beijing, China
Abstract :
This paper addresses a new text classification method: Sparse Topic Model, which represents documents by the sparse coding of topics. Topics contain more semantic information than words, so it´s more effective for feature representation of documents. Topics are extracted from documents by LDA in an unsupervised way. Based on these topics, sparse coding is applied to discover more high-level representation. We compare the Sparse Topic Model with the traditional methods, such as SVM, and the experimental result show that the proposed method achieves better performance, especially when the number of training examples is limited. The effect of topic number and word number per topic on the performance is also investigated. Due to the unsupervised characteristic of Sparse Topic Model, it´s very useful for real application.
Keywords :
pattern classification; text analysis; text detection; unsupervised learning; LDA; SVM; document feature representation; high-level representation; latent Dirichlet allocation; semantic information; sparse coding; sparse topic model; text classification method; topic number effect; topics extraction; unsupervised characteristic; word number per topic; Abstracts; Petroleum; Semantics; Support vector machines; Text categorization; Sparse coding; Text classification; Topic model;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2013 International Conference on
Conference_Location :
Tianjin
DOI :
10.1109/ICMLC.2013.6890908