• DocumentCode
    2985327
  • Title

    RankTopic: Ranking Based Topic Modeling

  • Author

    Dongsheng Duan ; Yuhua Li ; Ruixuan Li ; Rui Zhang ; Aiming Wen

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    211
  • Lastpage
    220
  • Abstract
    Topic modeling has become a widely used tool for document management due to its superior performance. However, there are few topic models distinguishing the importance of documents on different topics. In this paper, we investigate how to utilize the importance of documents to improve topic modeling and propose to incorporate link based ranking into topic modeling. Specifically, topical pagerank is used to compute the topic level ranking of documents, which indicates the importance of documents on different topics. By retreating the topical ranking of a document as the probability of the document involved in corresponding topic, a generalized relation is built between ranking and topic modeling. Based on the relation, a ranking based topic model Rank Topic is proposed. With Rank Topic, a mutual enhancement framework is established between ranking and topic modeling. Extensive experiments on paper citation data and Twitter data are conducted to compare the performance of Rank Topic with that of some state-of-the-art topic models. Experimental results show that Rank Topic performs much better than some baseline models and is comparable with the state-of-the-art link combined relational topic model (RTM) in generalization performance, document clustering and classification by setting a proper balancing parameter. It is also demonstrated in both quantitative and qualitative ways that topics detected by Rank Topic are more interpretable than those detected by some baseline models and still competitive with RTM.
  • Keywords
    document handling; generalisation (artificial intelligence); pattern classification; pattern clustering; probability; RankTopic tool; Twitter data; balancing parameter; document classification; document clustering; document importance; document management; document ranking; generalization performance; generalized relation; mutual enhancement framework; paper citation data; probability; ranking based topic modeling; relational topic model; topical pagerank; Computational modeling; Data models; Educational institutions; Equations; Mathematical model; Noise; Web pages; Classification; Clustering; Document Network; Ranking; Topic Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.12
  • Filename
    6413901