Title :
Web Spam Detection by Exploring Densely Connected Subgraphs
Author :
Leon-Suematsu, Yutaka I. ; Inui, Kentaro ; Kurohashi, Sadao ; Kidawara, Yutaka
Author_Institution :
Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
Abstract :
In this paper, we present a Web spam detection algorithm that relies on link analysis. The method consists of three steps: (1) decomposition of web graphs in densely connected sub graphs and calculation of the features for each sub graph, (2) use of SVM classifiers to identify sub graphs composed of Web spam, and (3) propagation of predictions over web graphs by a biased Page Rank algorithm to expand the scope of identification. We performed experiments on a public benchmark. An empirical study of the core structure of web graphs suggests that highly ranked non-spam hosts can be identified by viewing the coreness of the web graph elements.
Keywords :
Internet; graph theory; pattern classification; support vector machines; unsolicited e-mail; PageRank algorithm; SVM classifier; Web graph decomposition; Web spam detection; densely connected subgraph; link analysis; nonspam host; Algorithm design and analysis; Feature extraction; Search engines; Support vector machines; Testing; Training; Unsolicited electronic mail; Web spam; biased pagerank; dense subgraphs;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM International Conference on
Conference_Location :
Lyon
Print_ISBN :
978-1-4577-1373-6
Electronic_ISBN :
978-0-7695-4513-4
DOI :
10.1109/WI-IAT.2011.152