Author_Institution :
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
Abstract :
In recent years, social media websites, such as Epinions, Twitter, and Google+, have gained in popularity and have become ubiquitous in our daily lives, where rich user-generated texts are propagated through social networks. Topic models, such as Latent Dirichlet Allocation (LDA), have been proposed and shown to be useful for text analysis. The existing topic models focus on traditional document collections, which consist of a relatively small number of long and high-quality documents. However, user-generated texts tend to be shorter and noisier than traditional content. Besides, the social networks have two novel features: context information on nodes, such as user features, and edges, such as relationship, which have not been considered by the existing topic models. In this paper, we pose the problem of finding user topics in large-scale collection of documents from online social networks. We propose a comprehensive Feature based and a Social based Topic model, taking into account the user features and social networks. We demonstrate that our models have better performance than a baseline LDA in the Epinions, Twitter, and Google+ data sets.
Keywords :
social networking (online); text analysis; Epinions; Google+; LDA; Twitter; context information; document collection; feature based topic model; latent Dirichlet allocation; online social media; online social network; social based topic model; social media Website; text analysis; topic modeling; user features; user-generated text; Computational modeling; Data models; Google; Media; Twitter; Vectors; Social Network; Topic Modeling; User Feature;