• DocumentCode
    1335320
  • Title

    A Comprehensive Approach to Image Spam Detection: From Server to Client Solution

  • Author

    Gao, Yan ; Choudhary, Alok ; Hua, Gang

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
  • Volume
    5
  • Issue
    4
  • fYear
    2010
  • Firstpage
    826
  • Lastpage
    836
  • Abstract
    Image spam is a type of e-mail spam that embeds spam text content into graphical images to bypass traditional text-based e-mail spam filters. To effectively detect image spam, it is desirable to leverage image content analysis technologies. However, most previous works of image spam detection focus on filtering the image spam on the client side. We propose a more desirable comprehensive solution which embraces both server-side filtering and client-side detection to effectively mitigate image spam. On the server side, we present a nonnegative sparsity induced similarity measure for cluster analysis of spam images to filter the attack activities of spammers and fast trace back the spam sources. On the client side, we employ the principle of active learning where the learner guides the users to label as few images as possible while maximizing the classification accuracy. The server-side filtering identifies large image clusters as suspicious spam sources and further analysis can be performed to identify the real sources and block them from the beginning. For those spam images which survived the server-side filter, our active learner on the client side will further guide the users to interactively and efficiently filter them out. Our experiments on an image spam data-set collected from the e-mail server of our department demonstrate the efficacy of the proposed comprehensive solution.
  • Keywords
    e-mail filters; filtering theory; image recognition; learning (artificial intelligence); object detection; pattern clustering; text analysis; unsolicited e-mail; active learning; client-side detection; cluster analysis; e-mail server; graphical images; image spam data-set; image spam detection; image spam mitigation; nonnegative sparsity; server-side filtering; spam text content; text-based e-mail spam filters; Algorithm design and analysis; Classification algorithms; Electronic mail; Image recognition; Optical character recognition software; Unsolicited electronic mail; Visualization; Active learning; clustering; image recognition; image spam; spam filtering; sparse representation;
  • fLanguage
    English
  • Journal_Title
    Information Forensics and Security, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1556-6013
  • Type

    jour

  • DOI
    10.1109/TIFS.2010.2080267
  • Filename
    5585752