Classification of email using BeaKS: Behavior and keyword stemming

Author

Bhat, Veena H. ; Malkani, Vandana R. ; Shenoy, P. Deepa ; Venugopal, K.R. ; Patnaik, L.M.

Author_Institution

Dept. of CSE, Univ. Visvesvaraya, Bangalore, India

fYear

2011

fDate

21-24 Nov. 2011

Firstpage

1139

Lastpage

1143

Abstract

Spam mails are one of the greatest challenges faced by internet service providers, organizations and internet users in unison. Spam mails may be targeted, with a malicious intent or just as a commercial marketing activity - on the whole unwanted by everyone except the dispatcher. Spam filters continuously evolve as spammers go techno-savvy and creative. Machine learning algorithms have been popularly used for classifying and predicting mails as spam or ham (the good emails). This work presents a spam filter, BeaKS, with a focused preprocessing phase that weaves both the content of the email and two behavioral characteristics extracted from the email, to predict the category a mail belongs to: spam or ham. The accuracy of the proposed prediction model using Random Forests as the classifier is shown to be superior over other recent techniques. This approach is simple, easy to implement and reliable.

Keywords

learning (artificial intelligence); pattern classification; security of data; unsolicited e-mail; BeaKS; behavioral characteristics; classifier; email classification; ham mails; keyword stemming; machine learning algorithms; random forests; spam filters; spam mails; Accuracy; Artificial neural networks; Feature extraction; Niobium; Postal services; Unsolicited electronic mail; Email classification; email content; machine learning; random forests; spammer behaviour;

fLanguage

English

Publisher

ieee

Conference_Titel

TENCON 2011 - 2011 IEEE Region 10 Conference

Conference_Location

Bali

ISSN

2159-3442

Print_ISBN

978-1-4577-0256-3

Type

conf

DOI

10.1109/TENCON.2011.6129290

Filename

6129290