Title :
Detecting malicious tweets in trending topics using clustering and classification
Author :
Soman, Saini Jacob ; Murugappan, S.
Author_Institution :
Fac. of CSE, Sathyabama Univ., Chennai, India
Abstract :
Detection of spam Twitter social networks is one of the significant research areas to discover unauthorized user accounts. A number of research works have been carried out to solve these issues but most of the existing techniques had not focused on various features and doesn´t group similar user trending topics which become their major limitation. Trending topics collects the current Internet trends and topics of argument of each and every user. In order to overcome the problem of feature extraction,this work initially extracts many features such as user profile features, user activity features, location based features and text and content features. Then the extracted text features use Jenson-Shannon Divergence (JSD) measure to characterize each labeled tweet using natural language models. Different features are extracted from collected trending topics data in twitter. After features are extracted, clusters are formed to group similar trending topics of tweet user profile. Fuzzy K-means (FKM) algorithm primarily cluster the similar user profiles with same trending topics of tweet and centers are determined to similar user profiles with same trending topics of tweet from fuzzy membership function. Moreover, Extreme learning machine (ELM) algorithm is applied to analyze the growing characteristics of spam with similar topics in twitter from clustering result and acquire necessary knowledge in the detection of spam. The results are evaluated with F-measure, True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy with improved detection results.
Keywords :
Internet; feature extraction; learning (artificial intelligence); pattern classification; pattern clustering; social networking (online); text analysis; ELM algorithm; FKM algorithm; FPR; Internet trends; JSD measure; Jenson-Shannon divergence measure; TPR; Twitter social networks; classification accuracy; content features; extreme learning machine algorithm; f-measure; false positive rate; feature extraction; fuzzy k-means algorithm; fuzzy membership function; location based features; malicious tweet detection; natural language models; similar user profile clustering; spam detection; text features; trending topics; true positive rate; tweet user profile; unauthorized user account discovery; user activity features; user profile features; Accuracy; Clustering algorithms; Feature extraction; Market research; Support vector machines; Twitter; Extreme learning machine algorithm; Fuzzy KMeans Clustering algorithm; Social network; Spam detection;
Conference_Titel :
Recent Trends in Information Technology (ICRTIT), 2014 International Conference on
Conference_Location :
Chennai
DOI :
10.1109/ICRTIT.2014.6996188