Title :
Splog Detection using Content, Time and Link Structures
Author :
Lin, Yu-Ru ; Sundaram, Hari ; Chi, Yun ; Tatemura, Jun ; Tseng, Belle
Author_Institution :
Arizona State Univ. Tempe, Tempe
Abstract :
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms and splogs corrupt blog search results as well as waste network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. The key idea is that splogs exhibit high temporal regularity in content and post time, as well as consistent linking patterns. Temporal content regularity is detected using a novel autocorrelation of post content. Temporal structural regularity is determined using the entropy of the post time difference distribution, while the link regularity is computed using a HITS based hub score measure. Experiments based on the annotated ground truth on real world dataset show excellent results on splog detection tasks with 90% accuracy.
Keywords :
Web sites; unsolicited e-mail; HITS based hub score measure; autocorrelation; blog temporal dynamics; content structures; high temporal regularity; link structures; linking patterns; media social communication; post time difference distribution; spam blog detection; splog detection; structural regularity; time structures; waste network resources; Autocorrelation; Data mining; Distributed computing; Entropy; Information services; Internet; Joining processes; Vectors; Web pages; Web sites;
Conference_Titel :
Multimedia and Expo, 2007 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
1-4244-1016-9
Electronic_ISBN :
1-4244-1017-7
DOI :
10.1109/ICME.2007.4285079