DocumentCode :
2892632
Title :
Web Spam: A Study of the Page Language Effect on the Spam Detection Features
Author :
Alarifi, Abdulrahman ; Alsaleh, Mansour
Author_Institution :
Comput. Res. Inst., King Abdulaziz City for Sci. & Technol., Riyadh, Saudi Arabia
Volume :
2
fYear :
2012
fDate :
12-15 Dec. 2012
Firstpage :
216
Lastpage :
221
Abstract :
Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.
Keywords :
Internet; Web sites; unsolicited e-mail; Web page; Web spam mechanisms; Web stammers; page language effect; search engines; spam detection features; spam pages; Browsers; Decision trees; Feature extraction; MATLAB; Search engines; Web pages; Content spam; Link spam; Search engine spam; Spamdexing; Web spam;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
Type :
conf
DOI :
10.1109/ICMLA.2012.229
Filename :
6406753
Link To Document :
بازگشت