• DocumentCode
    714181
  • Title

    Performance of using LDA for Chinese news text classification

  • Author

    Xiaojun Wu ; Liying Fang ; Pu Wang ; Nan Yu

  • Author_Institution
    Dept. of Electron. Inf. & Control Eng., Beijing Univ. of Technol., Beijing, China
  • fYear
    2015
  • fDate
    3-6 May 2015
  • Firstpage
    1260
  • Lastpage
    1264
  • Abstract
    Chinese text classification is always challenging, especially when data are high dimensional and sparse. In this paper, we are interested in the way of text representation and dimension reduction in Chinese text classification. First, we introduces a topic model - Latent Dirichlet Allocation(LDA), which is uses LDA model as a dimension reduction method. Second, we choose Support Vector Machine(SVM) as the classification algorithm. Next, a method of text classification based on LDA and SVM is described. Finally, we choose documents with large number of Chinese text for experiment. Compared with LDA method and the traditional TF*IDF method, the experimental results show that LDA method runs a better results both on the classification accuracy and running time.
  • Keywords
    information resources; natural language processing; pattern classification; support vector machines; text analysis; Chinese news text classification; LDA; SVM; TF-IDF method; dimension reduction; documents; latent Dirichlet allocation; support vector machine; text representation; topic model; Accuracy; Classification algorithms; Numerical models; Resource management; Support vector machines; Text categorization; Training; LDA; dimension reduction; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on
  • Conference_Location
    Halifax, NS
  • ISSN
    0840-7789
  • Print_ISBN
    978-1-4799-5827-6
  • Type

    conf

  • DOI
    10.1109/CCECE.2015.7129459
  • Filename
    7129459