DocumentCode :
624004
Title :
Topic extraction in social media
Author :
Rafea, Ahmed ; Mostafa, Nada A.
Author_Institution :
Comput. Sci. & Eng. Dept., American Univ. in Cairo, Cairo, Egypt
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
94
Lastpage :
98
Abstract :
Social networks have become the most important source of news and people\´s feedback and opinion about almost every daily topic. With this massive amount of information over the web from different social networks like Twitter, Facebook, Blogs, etc, there has to be an automatic tool that can determine the topics that people are talking about and what are there sentiments about these topics. The goal of the research described in this paper was to develop a prototype that can "feel" the pulse of the Arabic users with regards to a certain hot topic. Our experience in extracting Arabic hot topics from Twitter is presented in this paper. The unigram words that occurred more than 20 times in the whole corpus were used as features for clustering the tweets using bisecting k-mean clustering algorithm. This has resulted in purity of 0.704 and entropy of 0.275. The score generated for the quality of the generated topic was 72.5%.
Keywords :
data mining; entropy; information retrieval; natural language processing; pattern clustering; social networking (online); word processing; Arabic hot topic extraction; Arabic user pulse feeling; Blogs; Facebook; Twitter; bisecting k-mean clustering algorithm; entropy; news source; people feedback; people opinion; people sentiments; purity; social media; social networks; tweets clustering; unigram words; Blogs; Clustering algorithms; Data mining; Entropy; Feature extraction; Heuristic algorithms; Twitter; Clustering; Information Extraction; Social Media Applications; Twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Collaboration Technologies and Systems (CTS), 2013 International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-6403-4
Type :
conf
DOI :
10.1109/CTS.2013.6567212
Filename :
6567212
Link To Document :
بازگشت