Title :
Search engine click spam detection
Author :
Xin Li ; Min Zhang ; Yiqun Liu ; Shaoping Ma ; Yijiang Jin ; Liyun Ru
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fDate :
Oct. 30 2012-Nov. 1 2012
Abstract :
Using search engine to retrieve information has become an important part in people´s daily life. For most search engines, click information is a significant factor in document ranking. As a result, some websites use cheating methods to get a higher rank by increasing clicks on its page fraudulently in search results to earn huge commercial interest, which is called “Click Spam”. Based on the analysis of features of cheating clicks, a novel automatic click spam detection approach is proposed in this paper, which consists of: 1) detect the single click record spam, by which 0.54% of all the clicks are detected as spams; 2) model user sessions with a triple sequence which by the first time, to be best of our knowledge, takes not only user action, but also action object and time interval between actions into consideration in related research; 3) based on the detected single click record spam and other features, find seed cheating session modes; and then use bipartite graph iterative algorithm to get higher precision and recall of click spam detection. Experiments have been made on Chinese commercial search engine real log data, containing around 80 million user clicks per day. As a result, 2.1% of all the clicks can be detected as spams, and the precision reaches to 97%. The proposed framework is with the high capability to detect click spam precisely and efficiently, which can be easily implemented in real world commercial search engine service.
Keywords :
Web sites; document handling; graph theory; information retrieval; iterative methods; search engines; unsolicited e-mail; Chinese commercial search engine real log data; Web sites; automatic click spam detection approach; bipartite graph iterative algorithm; cheating clicks; cheating methods; click information; daily life; document ranking; information retrieval; real world commercial search engine service; search engine click spam detection; single click record detection; user session model; Algorithm design and analysis; Bipartite graph; Classification algorithms; Feature extraction; Iterative methods; Search engines; Unsolicited electronic mail; Bipartite graph iterative algorithm; Click spam; User session;
Conference_Titel :
Cloud Computing and Intelligent Systems (CCIS), 2012 IEEE 2nd International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4673-1855-6
DOI :
10.1109/CCIS.2012.6664324