Title :
Web Spam: A Study of the Page Language Effect on the Spam Detection Features
Author :
Alarifi, Abdulrahman ; Alsaleh, Mansour
Author_Institution :
Comput. Res. Inst., King Abdulaziz City for Sci. & Technol., Riyadh, Saudi Arabia
Abstract :
Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.
Keywords :
Internet; Web sites; unsolicited e-mail; Web page; Web spam mechanisms; Web stammers; page language effect; search engines; spam detection features; spam pages; Browsers; Decision trees; Feature extraction; MATLAB; Search engines; Web pages; Content spam; Link spam; Search engine spam; Spamdexing; Web spam;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
DOI :
10.1109/ICMLA.2012.229