Unsupervised Construction of Topic-Based Twitter Lists

Author

de Villiers, F. ; Hoffmann, Marco ; Kroon, S.

Author_Institution

Comput. Sci. Div., Stellenbosch Univ., Stellenbosch, South Africa

fYear

2012

fDate

3-5 Sept. 2012

Firstpage

283

Lastpage

292

Abstract

The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.

Keywords

pattern clustering; social networking (online); text analysis; ANID; AP; LDA topic model; TF-IDF; Twitter lists feature; adjusted normalised information distance; affinity propagation; continual curation; curated group; document representation technique; k-means; latent Dirichlet allocation; list creation; organisation tool; similarity measures; term frequency-inverse document frequency; text domain; topic-based Twitter list; unsupervised clustering algorithm; unsupervised construction; Clustering algorithms; Correlation; Entropy; Indexes; Resource management; Twitter; Vectors; Twitter; clustering; similarity; topics; unsupervised;

fLanguage

English

Publisher

ieee

Conference_Titel

Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom)

Conference_Location

Amsterdam

Print_ISBN

978-1-4673-5638-1

Type

conf

DOI

10.1109/SocialCom-PASSAT.2012.64

Filename

6406257