Filtering spam e-mail with Generalized Additive Neural Networks

Author

Du Toit, Tiny ; Kruger, Hennie

Author_Institution

Sch. of Comput., Stat. & Math. Sci., North-West Univ., Potchefstroom, South Africa

fYear

2012

fDate

15-17 Aug. 2012

Firstpage

1

Lastpage

8

Abstract

Some of the major security risks associated with spam e-mail are the spreading of computer viruses and the facilitation of phishing exercises. Spam is therefore regarded as one of the prominent security threats in modern organizations. Security controls, such as spam filtering techniques, have become increasingly important to protect information and information assets. In this paper the performance of a Generalized Additive Neural Network on a publicly available e-mail corpus is investigated in the context of statistical spam filtering. The neural network is compared to a Naive Bayesian classifier and a Memory-based technique. Generalized Additive Neural Networks have a number of advantages compared to neural networks in general. An automated construction algorithm performs feature and model selection simultaneously and produces results which can be interpreted by a graphical method. This algorithm is powerful, effective and performs highly accurate compared to other non-linear model selection methods. The paper also considers the impact of different feature set sizes using cost-sensitive measures. These criteria are sensitive to the cost difference between two common types of errors made by filtering systems. Experiments show better performance compared to the Naive Bayes and Memory-based classifiers where legitimate e-mails are assigned the same cost as spams. This result suggests Generalized Additive Neural Networks may be utilized to flag spam e-mails in order to prioritize the reading of messages.

Keywords

Bayes methods; computer viruses; information filtering; neural nets; unsolicited e-mail; Naive Bayesian classifier; automated construction algorithm; computer viruses; e-mail corpus; filtering spam e-mail; generalized additive neural networks; graphical method; information assets; information protection; memory based technique; phishing exercises; security controls; security risks; security threats; spam filtering techniques; statistical spam filtering; Accuracy; Additives; Bayesian methods; Biological neural networks; Unsolicited electronic mail; Generalized Additive Neural Network; Memorybased classifier; Naive Bayesian classifier; Neural Network; Security risk; Spam; Spam filtering;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Security for South Africa (ISSA), 2012

Conference_Location

Johannesburg, Gauteng

Print_ISBN

978-1-4673-2160-0

Type

conf

DOI

10.1109/ISSA.2012.6320446

Filename

6320446