• DocumentCode
    2490853
  • Title

    Content-based spam filtering

  • Author

    Almeida, Tiago A. ; Yamakami, Akebo

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Univ. of Campinas - UNICAMP, Campinas, Brazil
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    The growth of email users has resulted in the dramatic increasing of the spam emails. Helpfully, there are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. However, there are several forms of Naive Bayes filters, something the anti-spam literature does not always acknowledge. In this paper, we discuss seven different versions of Naive Bayes classifiers, and compare them with the well-known Linear Support Vector Machine on six non-encoded datasets. Moreover, we propose a new measurement in order to evaluate the quality of anti-spam classifiers. In this way, we investigate the benefits of using Matthews correlation coefficient as the measure of performance.
  • Keywords
    Bayes methods; content-based retrieval; information filtering; support vector machines; unsolicited e-mail; Bayesian decision theory; Matthews correlation coefficient; content-based spam filtering; email users; linear support vector machine; naive Bayes classifiers; Electronic mail; Gaussian distribution; Manganese; Niobium; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2010 International Joint Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-6916-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2010.5596569
  • Filename
    5596569