• DocumentCode
    610359
  • Title

    A unified model for stable and temporal topic detection from social media data

  • Author

    Hongzhi Yin ; Bin Cui ; Hua Lu ; Yuxin Huang ; Junjie Yao

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    661
  • Lastpage
    672
  • Abstract
    Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model´s performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.
  • Keywords
    Internet; information retrieval; linguistics; social networking (online); text analysis; Del.icio.us; Twitter; UGC; Web 2.0; burst-weighted smoothing scheme; linguistic features; online social media; spatial regularization; stable topic detection; temporal topic detection; user-generated contents; user-temporal mixture model; Equations; Feature extraction; Hidden Markov models; Mathematical model; Media; Twitter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544864
  • Filename
    6544864