• DocumentCode
    643902
  • Title

    Search engine click spam detection

  • Author

    Xin Li ; Min Zhang ; Yiqun Liu ; Shaoping Ma ; Yijiang Jin ; Liyun Ru

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • Volume
    02
  • fYear
    2012
  • fDate
    Oct. 30 2012-Nov. 1 2012
  • Firstpage
    985
  • Lastpage
    989
  • Abstract
    Using search engine to retrieve information has become an important part in people´s daily life. For most search engines, click information is a significant factor in document ranking. As a result, some websites use cheating methods to get a higher rank by increasing clicks on its page fraudulently in search results to earn huge commercial interest, which is called “Click Spam”. Based on the analysis of features of cheating clicks, a novel automatic click spam detection approach is proposed in this paper, which consists of: 1) detect the single click record spam, by which 0.54% of all the clicks are detected as spams; 2) model user sessions with a triple sequence which by the first time, to be best of our knowledge, takes not only user action, but also action object and time interval between actions into consideration in related research; 3) based on the detected single click record spam and other features, find seed cheating session modes; and then use bipartite graph iterative algorithm to get higher precision and recall of click spam detection. Experiments have been made on Chinese commercial search engine real log data, containing around 80 million user clicks per day. As a result, 2.1% of all the clicks can be detected as spams, and the precision reaches to 97%. The proposed framework is with the high capability to detect click spam precisely and efficiently, which can be easily implemented in real world commercial search engine service.
  • Keywords
    Web sites; document handling; graph theory; information retrieval; iterative methods; search engines; unsolicited e-mail; Chinese commercial search engine real log data; Web sites; automatic click spam detection approach; bipartite graph iterative algorithm; cheating clicks; cheating methods; click information; daily life; document ranking; information retrieval; real world commercial search engine service; search engine click spam detection; single click record detection; user session model; Algorithm design and analysis; Bipartite graph; Classification algorithms; Feature extraction; Iterative methods; Search engines; Unsolicited electronic mail; Bipartite graph iterative algorithm; Click spam; User session;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Intelligent Systems (CCIS), 2012 IEEE 2nd International Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4673-1855-6
  • Type

    conf

  • DOI
    10.1109/CCIS.2012.6664324
  • Filename
    6664324