• DocumentCode
    3440718
  • Title

    Evaluation of Stability and Similarity of Latent Dirichlet Allocation

  • Author

    Jun Tang ; Ruilong Huo ; Jiali Yao

  • Author_Institution
    China COE, Pivotal, Beijing, China
  • fYear
    2013
  • fDate
    3-4 Dec. 2013
  • Firstpage
    78
  • Lastpage
    83
  • Abstract
    Latent Dirichlet Allocation (LDA) is an unsupervised, statistical method to model documents and discover latent semantic topics from large set of documents and categorize them into learned topics. In this paper, we first introduce LDA and its distributed version Parallel LDA (PLDA), along with some popular implementations. Then we propose a systematic solution to evaluate stability and similarity of the trained models and classification results of LDA/PLDA. We address three key challenges within the evaluation solution: (i) topics matching in Kullback Liebler (KL) divergence calculation, (ii) calculation of stability using KL divergence and interpretation of relationship between KL divergence and stability of the trained model and the classification results, (iii) calculation and evaluation of similarity of trained models and classification results. Finally, we experiment with real life datasets to show that our solution is sufficient and efficient.
  • Keywords
    data mining; distributed processing; document handling; pattern classification; statistical analysis; unsupervised learning; KL divergence calculation; Kullback Liebler divergence calculation; LDA classification; PLDA classification; distributed parallel LDA; document modelling; latent Dirichlet allocation similarity evaluation; latent Dirichlet allocation stability evaluation; latent semantic topic discovery; topic matching; trained model similarity calculation; trained model similarity evaluation; unsupervised statistical method; Classification algorithms; Computational modeling; Electromagnetic compatibility; Google; Measurement; Stability analysis; Systematics; LDA; evaluation; similarity; stability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering (WCSE), 2013 Fourth World Congress on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4799-2882-8
  • Type

    conf

  • DOI
    10.1109/WCSE.2013.17
  • Filename
    6754267