DocumentCode
589051
Title
Unsupervised Construction of Topic-Based Twitter Lists
Author
de Villiers, F. ; Hoffmann, Marco ; Kroon, S.
Author_Institution
Comput. Sci. Div., Stellenbosch Univ., Stellenbosch, South Africa
fYear
2012
fDate
3-5 Sept. 2012
Firstpage
283
Lastpage
292
Abstract
The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.
Keywords
pattern clustering; social networking (online); text analysis; ANID; AP; LDA topic model; TF-IDF; Twitter lists feature; adjusted normalised information distance; affinity propagation; continual curation; curated group; document representation technique; k-means; latent Dirichlet allocation; list creation; organisation tool; similarity measures; term frequency-inverse document frequency; text domain; topic-based Twitter list; unsupervised clustering algorithm; unsupervised construction; Clustering algorithms; Correlation; Entropy; Indexes; Resource management; Twitter; Vectors; Twitter; clustering; similarity; topics; unsupervised;
fLanguage
English
Publisher
ieee
Conference_Titel
Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom)
Conference_Location
Amsterdam
Print_ISBN
978-1-4673-5638-1
Type
conf
DOI
10.1109/SocialCom-PASSAT.2012.64
Filename
6406257
Link To Document