DocumentCode
2490853
Title
Content-based spam filtering
Author
Almeida, Tiago A. ; Yamakami, Akebo
Author_Institution
Sch. of Electr. & Comput. Eng., Univ. of Campinas - UNICAMP, Campinas, Brazil
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
7
Abstract
The growth of email users has resulted in the dramatic increasing of the spam emails. Helpfully, there are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory and Support Vector Machines. However, there are several forms of Naive Bayes filters, something the anti-spam literature does not always acknowledge. In this paper, we discuss seven different versions of Naive Bayes classifiers, and compare them with the well-known Linear Support Vector Machine on six non-encoded datasets. Moreover, we propose a new measurement in order to evaluate the quality of anti-spam classifiers. In this way, we investigate the benefits of using Matthews correlation coefficient as the measure of performance.
Keywords
Bayes methods; content-based retrieval; information filtering; support vector machines; unsolicited e-mail; Bayesian decision theory; Matthews correlation coefficient; content-based spam filtering; email users; linear support vector machine; naive Bayes classifiers; Electronic mail; Gaussian distribution; Manganese; Niobium; Support vector machines; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2010 International Joint Conference on
Conference_Location
Barcelona
ISSN
1098-7576
Print_ISBN
978-1-4244-6916-1
Type
conf
DOI
10.1109/IJCNN.2010.5596569
Filename
5596569
Link To Document