Title :
Cascaded Simple Filters for Accurate and Lightweight Email-Spam Detection
Author_Institution :
Dept. Appl. Inf., Hosei Univ., Tokyo, Japan
Abstract :
Accurate spam filters, such as the Bayesian filter, need a large cost for off-line training (or learning) based on the analysis of a large corpus of email. This paper presents cascaded simple, i.e., rule-based, filters for accurate and lightweight detection of email spam. We cascade three filters that classify email based on respectively the fingerprints of message bodies, the white and black lists of email addresses in the From header, and the words specific to spam and legitimate email in the Subject header. Our filter need no training, but collect by themselves the information above when they are working, and especially when the user notifies them of their false negative decision (classifying spam as legitimate). We show by experiment with about 20,000 real world emails that the cascaded simple filters achieve the false negative rate of about 0.025 with no false positive (deciding legal email as spam) and the high performance of about 90 emails per seconds.
Keywords :
Bayes methods; e-mail filters; learning (artificial intelligence); security of data; unsolicited e-mail; Bayesian filter; cascaded simple filter; email addresses; false negative decision; lightweight email spam detection; message body fingerprint; off line training; rule based filter; subject header; Accuracy; Bayesian methods; Machine learning; Pattern matching; Training; Unsolicited electronic mail; Spam filter; cascaded filters; ensemble filter; fingerprint; rule-based; white/black lists;
Conference_Titel :
Emerging Security Information Systems and Technologies (SECURWARE), 2010 Fourth International Conference on
Conference_Location :
Venice
Print_ISBN :
978-1-4244-7517-9
Electronic_ISBN :
978-0-7695-4095-5
DOI :
10.1109/SECURWARE.2010.34