• DocumentCode
    35428
  • Title

    Self-adaptive topic model: A solution to the problem of ???rich topics get richer???

  • Author

    Ying Fang ; Heyan Huang ; Ping Jian ; Xin Xin ; Chong Feng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol., Beijing, China
  • Volume
    11
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    35
  • Lastpage
    43
  • Abstract
    The problem of “rich topics get richer” (RTGR) is popular to the topic models, which will bring the wrong topic distribution if the distributing process has not been intervened. In standard LDA (Latent Dirichlet Allocation) model, each word in all the documents has the same statistical ability. In fact, the words have different impact towards different topics. Under the guidance of this thought, we extend ILDA (Infinite LDA) by considering the bias role of words to divide the topics. We propose a self-adaptive topic model to overcome the RTGR problem specifically. The model proposed in this paper is adapted to three questions: (1) the topic number is changeable with the collection of the documents, which is suitable for the dynamic data; (2) the words have discriminating attributes to topic distribution; (3) a self-adaptive method is used to realize the automatic re-sampling. To verify our model, we design a topic evolution analysis system which can realize the following functions: the topic classification in each cycle, the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order. The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand, the result was feasible.
  • Keywords
    pattern classification; text analysis; ILDA; NIPS corpus; RTGR problem; adjacent cycles; automatic resampling; infinite LDA; latent Dirichlet allocation; rich topics get richer; self-adaptive topic model; self-built news collections; standard LDA model; strength calculation; topic classification; topic correlation; topic evolution analysis system; topic number; Adaptation models; Big data; Computational modeling; Data models; Integrated circuit modeling; Resource management; Tagging; Dirichlet process; infinite Latent Dirichlet Allocation; topic evolution; topic model;
  • fLanguage
    English
  • Journal_Title
    Communications, China
  • Publisher
    ieee
  • ISSN
    1673-5447
  • Type

    jour

  • DOI
    10.1109/CC.2014.7019838
  • Filename
    7019838