• DocumentCode
    145131
  • Title

    An improved LDA algorithm for text classification

  • Author

    Dexin Zhao ; Jinqun He ; Jin Liu

  • Author_Institution
    Tianjin Key Lab. of Intell. Comput. & Novel Software Technol., Tianjin Univ. of Technol., Tianjin, China
  • Volume
    1
  • fYear
    2014
  • fDate
    26-28 April 2014
  • Firstpage
    217
  • Lastpage
    221
  • Abstract
    Latent Dirichlet Allocation is a classic topic model which can extract latent topic from large data corpus. This model assumes that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. In this paper, we present an algorithm called gLDA for topic text classification by adding topic-category distribution parameter to LDA, which can make the document generated from the most relevant category. Gibbs sampling is employed to conduct approximate inference, and experiment results in two datasets show the effectiveness of this method.
  • Keywords
    pattern classification; sampling methods; text analysis; Gibbs sampling; LDA algorithm; approximate inference; data corpus; gLDA; latent Dirichlet allocation; topic text classification; topic-category distribution parameter; Accuracy; Data models; Predictive models; Resource management; Text categorization; Training; Training data; LDA; text classification; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on
  • Conference_Location
    Sapporo
  • Print_ISBN
    978-1-4799-3196-5
  • Type

    conf

  • DOI
    10.1109/InfoSEEE.2014.6948100
  • Filename
    6948100