Title :
Some empirical results on two spam detection methods
Author :
Matsumoto, Ryota ; Zhang, Du ; Lu, Meiliu
Author_Institution :
Dept. of Comput. Sci., California State Univ., Sacramento, CA, USA
Abstract :
In this paper, we describe the results of an empirical study on two spam detection methods: support vector machines (SVMs) and naive Bayes classifier (NBC). To conduct the study, we implement the NBC and choose to use the SVMlight, an application of SVMs developed by Thorsten Joachims. The NBC and the linear SVMs with different C parameters are trained on a set of 2000 emails with 1000 spams and 1000 nonspams, and are tested on 200 new emails with 100 in each class. A program is written that converts emails into feature vectors using both TF and TF-IDF term weighting methods. The evaluation criteria include accuracy rate, recall, precision, miss rate, and false alarm rate. The results indicate that the both approaches have their pros and cons.
Keywords :
Bayes methods; pattern classification; support vector machines; unsolicited e-mail; C parameters; TF term weighting method; TF-IDF term weighting method; accuracy rate; email; false alarm rate; feature vectors; miss rate; naive Bayes classifier; nonspams; precision; recall; spam detection methods; support vector machines; Computer science; Ducts; Electronic mail; Internet; Niobium compounds; Support vector machine classification; Support vector machines; Testing; Text categorization; Unsolicited electronic mail;
Conference_Titel :
Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on
Print_ISBN :
0-7803-8819-4
DOI :
10.1109/IRI.2004.1431460