Real-Time Twitter Content Polluter Detection Based on Direct Features

Author

Weiling Chen;Chai Kiat Yeo;Chiew Tong Lau;Bu Sung Lee

Author_Institution

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore

fYear

2015

Firstpage

1

Lastpage

4

Abstract

Too many content polluters on social networks make it difficult for users to browse valuable contents. Some research has been done in spam and phishing detection on social networks but these are only a small part of all content polluters. What bother users most are those large amount of repeated low quality advertisements. Hence it is necessary to filter these content polluters to improve users´ experiences. Moreover, most of the phishing/spam detection works are done offline and some of the features used take too much time to extract making it impossible for real-time detection. We perform a study on an extensive twitter dataset and present a definition of content polluters. We further propose some novel features and together with other commonly used features in phishing/spam detection, we classify them into two categories - direct features and indirect features. A simple random forest classifier is applied based on our proposed direct features alone for real-time content polluter detection and it achieves a reasonable high accuracy with high F1 values.

Keywords

"Feature extraction","Twitter","Real-time systems","Labeling","Electronic mail","Training"

Publisher

ieee

Conference_Titel

Information Science and Security (ICISS), 2015 2nd International Conference on

Type

conf

DOI

10.1109/ICISSEC.2015.7371027

Filename

7371027