DocumentCode :
3732285
Title :
Finding High-Level Topics and Tweet Labeling Using Topic Models
Author :
Sameendra Samarawickrama;Shanika Karunasekera;Aaron Harwood
Author_Institution :
Dept. of Comput. &
fYear :
2015
Firstpage :
242
Lastpage :
249
Abstract :
Making sense of Twitter data streams is challenging due to the extremely high volume of data. One way to address this challenge is to consider these data streams as containing a set of high-level topics. In this research we address the problem of: given a collection of tweets about K high-level topics, how to find topic words that describe these topics as well as how to label each tweet with one of the K topics using a topic modeling approach. Current research has shown that applying topic modeling algorithms directly on tweets does not lead to good results. Hence one approach is to group related tweets together, so as to form a single “pseudo-document”, which is more informative than a single tweet. In this paper we evaluate different grouping schemes found in the literature and propose a new grouping scheme utilizing named entities and word collocations. Results show that our proposed scheme performs better than the existing approaches, to a some extent for all the test cases, and for both finding high-level topics and tweet labeling.
Keywords :
"Twitter","Labeling","Tagging","Blogs","Distributed databases","Media","Resource management"
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on
Electronic_ISBN :
1521-9097
Type :
conf
DOI :
10.1109/ICPADS.2015.38
Filename :
7384301
Link To Document :
بازگشت