• DocumentCode
    2727532
  • Title

    PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering

  • Author

    Junejo, Khurum Nazir ; Karim, Asim

  • fYear
    2007
  • fDate
    2-5 Nov. 2007
  • Firstpage
    228
  • Lastpage
    234
  • Abstract
    The volume of spam e-mails has grown rapidly in the last two years resulting in increasing costs to users, network operators, and e-mail service providers (ESPs). E-mail users demand accurate spam filtering with minimum effort from their side. Since the distribution of spam and non-spam e-mails is often different for different users a single filter trained on a general corpus is not optimal for all users. The question asked by ESPs is: How do you build robust and scalable automatic personalized spam filters? We address this question by presenting PSSF, a novel statistical approach for personalized service-side spam filtering. PSSF builds a discriminative classifier from a statistical model of spam and non-spam e-mails. A classifier is first built on a general training corpus that is then adapted in one or more passes of soft labeling and classifier rebuilding over each user´s unlabeled e-mails. The statistical model captures the distribution of tokens in spam and non-spam e-mails. This model is robust in the sense that its size can be reduced significantly without degrading filtering performance. We evaluate PSSF on two datasets. The results demonstrate the superior performance and scalability of PSSF in comparison with other published results on the same datasets.
  • Keywords
    Costs; Degradation; Electronic mail; Electrostatic precipitators; Filtering; Filters; Labeling; Robustness; Scalability; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, IEEE/WIC/ACM International Conference on
  • Conference_Location
    Fremont, CA
  • Print_ISBN
    978-0-7695-3026-0
  • Type

    conf

  • DOI
    10.1109/WI.2007.47
  • Filename
    4427092