DocumentCode :
2118643
Title :
Effectively Detecting Content Spam on the Web Using Topical Diversity Measures
Author :
Cailing Dong ; Bin Zhou
Author_Institution :
Dept. of Inf. Syst., Univ. of Maryland, Baltimore, MD, USA
Volume :
1
fYear :
2012
fDate :
4-7 Dec. 2012
Firstpage :
266
Lastpage :
273
Abstract :
Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content spam detection. We adopt the web spam benchmark data set WEBSPAM-UK2007 for evaluation, and the experimental results verify that by integrating our topical diversity measures the performance of the state-of-the-art web spam detection methods can be greatly improved. In addition, comparing to existing features for training spam classification models, our topical diversity measures can achieve high spam detection performance using small set of training data. In personalized web spam detection, the training data (i.e., user´s spam labeling results) are typically small. Our finding makes personalized web spam detection highly achievable. We develop an efficient and effective regression model using topical diversity measures for personalized web spam detection, and present some promising results obtained from an empirical study.
Keywords :
Internet; pattern classification; unsolicited e-mail; WEBSPAM-UK2007; Web spam benchmark data; content-based features; link-based features; personalized Web content spam detection; regression model; spam classification model; topic models; topical diversity measures; training data; training spam classification models; classification; personalization; topic model; web spam;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2012 IEEE/WIC/ACM International Conferences on
Conference_Location :
Macau
Print_ISBN :
978-1-4673-6057-9
Type :
conf
DOI :
10.1109/WI-IAT.2012.98
Filename :
6511895
Link To Document :
بازگشت