Title :
Prediction of popular tweets using Similarity Learning
Author :
Ahmed, Hameeza ; Razzaq, Muhammad Asif ; Qamar, Ali Mustafa
Author_Institution :
Sch. of Electr. Eng. & Comput. Sci. (SEECS), Nat. Univ. of Sci. & Technol. (NUST), Islamabad, Pakistan
Abstract :
Social media is gaining popularity due to its information spreading feature. Twitter is one of the most powerful source of information sharing because of its massive users. Consequently, Twitter has become a popular resource in order to analyze the data for different research purposes like social engineering, sentiment analysis, business purposes etc. due to its easy data availability. In Twitter, the information may be categorized as important or un-important. Whatever information spreads through re-tweets becomes important or popular. As popular messages contain vital information for the users, one has to study the characteristics of such messages since it is related to breaking news identification, viral marketing and other similar tasks. In this research, we investigate the prediction of the popularity of messages by the number of re-tweets. We transform this task into a classification problem and existing Similarity Learning Algorithm (SiLA) is applied. SiLA, an extension of voted perceptron algorithm, learns the similarity matrix for kNN classification before classifying tweets as either popular or un-popular based on the content features. We classify tweets in binary as well as multi-class classification. For the former case, we consider that either the tweet has been re-tweeted (meaning popular) or not (unpopular). However, in the case of multi-class classification, SiLA uses different popular bands, defined by the number of re-tweet count The binary classification algorithm achieved 85% accuracy and the multi-class classification achieved 73% accuracy. Experimental results show that learning similarity measures improve the accuracy when compared with other kNN based methods like cosine similarity and Euclidean distance.
Keywords :
learning (artificial intelligence); matrix algebra; pattern classification; social networking (online); Euclidean distance; SiLA algorithm; Tweets classification; binary classification; breaking news identification; classification problem; cosine similarity; data availability; information sharing source; information spreading feature; k-nearest neighbor classification; kNN classification; message popularity prediction; multiclass classification; popular Tweets prediction; similarity learning algorithm; similarity matrix; similarity measures; social media; viral marketing; voted perceptron algorithm; Accuracy; Classification algorithms; Media; Standards; Symmetric matrices; Training; Twitter; SiLA algorithm; Similarity learning; kNN classification; popular tweets; social networks;
Conference_Titel :
Emerging Technologies (ICET), 2013 IEEE 9th International Conference on
Conference_Location :
Islamabad
Print_ISBN :
978-1-4799-3456-0
DOI :
10.1109/ICET.2013.6743524