• DocumentCode
    1797724
  • Title

    A robust framework for short text categorization based on topic model and integrated classifier

  • Author

    Peng Wang ; Heng Zhang ; Yu-Fang Wu ; Bo Xu ; Hong-Wei Hao

  • Author_Institution
    Inst. of Autom., Beijing, China
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    3534
  • Lastpage
    3539
  • Abstract
    In this paper, we propose a method for short text categorization using topic model and integrated classifier. To enrich the representation of short text, the Latent Dirichlet Allocation (LDA) model is used to extract latent topic information. While for classification, we combine two classifiers for achieving high reliability. Particularly, we train LDA models with variable number of topics using the Wikipedia corpus as external knowledge base, and extend labeled Web snippets by potential topics extracted by LDA. Then, the enriched representation of snippets are used to learn Maximum Entropy (MaxEnt) and support vector machine (SVM) classifiers separately. Finally, viewing that the most possible predicted result will appear in the top two candidates selected by MaxEnt classifier, we develop a novel scheme that if the gap between these candidates is large enough, the predicted result is considered to be reliable; otherwise, the SVM classifier will be integrated with MaxEnt classifier to make a comprehensive prediction. Experimental results show that our framework is effective and can outperform the state-of-the-art techniques.
  • Keywords
    Internet; learning (artificial intelligence); pattern classification; support vector machines; text analysis; LDA model; MaxEnt classifier; SVM classifier; Web snippets; Wikipedia corpus; integrated classifier; latent Dirichlet allocation model; latent topic information extraction; maximum entropy; short text categorization; short text representation; support vector machine; topic model; Electronic publishing; Encyclopedias; Internet; Semantics; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889589
  • Filename
    6889589