Title :
Unsupervised Construction of Topic-Based Twitter Lists
Author :
de Villiers, F. ; Hoffmann, Marco ; Kroon, S.
Author_Institution :
Comput. Sci. Div., Stellenbosch Univ., Stellenbosch, South Africa
Abstract :
The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.
Keywords :
pattern clustering; social networking (online); text analysis; ANID; AP; LDA topic model; TF-IDF; Twitter lists feature; adjusted normalised information distance; affinity propagation; continual curation; curated group; document representation technique; k-means; latent Dirichlet allocation; list creation; organisation tool; similarity measures; term frequency-inverse document frequency; text domain; topic-based Twitter list; unsupervised clustering algorithm; unsupervised construction; Clustering algorithms; Correlation; Entropy; Indexes; Resource management; Twitter; Vectors; Twitter; clustering; similarity; topics; unsupervised;
Conference_Titel :
Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom)
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4673-5638-1
DOI :
10.1109/SocialCom-PASSAT.2012.64