Title :
Predicting latent attributes by extracting lexical and sociolinguistics features from user tweets
Author :
Al Hawari, Muhammad Afif ; Khodra, Masayu Leylia
Author_Institution :
Sch. of Electr. Eng. & Inf., Inst. Teknol. Bandung, Bandung, Indonesia
Abstract :
Twitter user profile information is very useful for various fields such as marketing, HRD, advertising, and personalization. Since user profile provided by Twitter is very limited, some latent attributes such as gender, age, work, or interest should be predicted. In this paper, we aim to predict those four latent attributes using her/his tweet and bio data by employing machine learning techniques. We conduct experiments in order to find the best algorithm, weighting method, minimal frequency number, preprocess, for each latent attribute we predicts. We also compare the accuracy of lexical feature and sociolinguistic feature classification models. Our experiment shows that SVM is the best performer and lexical feature models perform better than sociolinguistic feature models.
Keywords :
learning (artificial intelligence); pattern classification; social networking (online); social sciences; support vector machines; HRD; SVM; Twitter user profile information; advertising; age; bio data; gender; interest; latent attributes prediciton; lexical feature; machine learning techniques; marketing; personalization; sociolinguistic feature classification models; support vector machine; user tweets; weighting method; work; Blogs; Communities; Conferences; Data mining; Decision support systems; Rhetoric; User-generated content; Twitter; classification; latent attribute; lexical feature; machine learning; sociolinguistic feature;
Conference_Titel :
Data and Software Engineering (ICODSE), 2014 International Conference on
Print_ISBN :
978-1-4799-8175-5
DOI :
10.1109/ICODSE.2014.7062666