• DocumentCode
    3447286
  • Title

    Feature expansion for Microblogging text based on Latent Dirichlet Allocation with User Feature

  • Author

    Wei Xia ; Yanxiang He ; Ye Tian ; Qiang Chen ; Lu Lin

  • Author_Institution
    Sch. of Comput., Wuhan Univ., Wuhan, China
  • Volume
    1
  • fYear
    2011
  • fDate
    20-22 Aug. 2011
  • Firstpage
    228
  • Lastpage
    232
  • Abstract
    Traditional TDT (Topic Detection and Tracking, TDT) is based on large scale of news stream. However, with the development of new technology, Microblogging platform has become a new generation of platform for information distribution and communication. As many features which are totally different from the common news report exist in Microblogging text, old methods for TDT become ineffective. We present a new framework based on U-LDA (Latent Dirichlet Allocation with User Feature, U-LDA) which considers the user features on the Microblogging platform. We expand the feature of short text on the Microblogging platform by using U-LDA Model, which improves the precision of TDT tasks. In this paper, we discuss and summarize the particular features of Microblogging text, and present a method which considers user features in LDA model, thus we propose a general TDT framework based on U-LDA model. By applying the new model on a Microblogging corpus, we conclude that U-LDA is more effective than LDA.
  • Keywords
    social networking (online); text analysis; TDT tasks; U-LDA; feature expansion; information distribution and communication; latent Dirichlet allocation with user feature; microblogging corpus; microblogging platform; microblogging text; topic detection and tracking; LDA model; TDT; short text; user features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEE Joint International
  • Conference_Location
    Chongqing
  • Print_ISBN
    978-1-4244-8622-9
  • Type

    conf

  • DOI
    10.1109/ITAIC.2011.6030192
  • Filename
    6030192