Title :
Spam Detection Using Feature Selection and Parameters Optimization
Author :
Lee, Sang Min ; Kim, Dong Seong ; Kim, Ji Ho ; Park, Jong Sou
Author_Institution :
Dept. of Comput. Eng., Korea Aerosp. Univ., Seoul, South Korea
Abstract :
Spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients´ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it. For spam detection, parameters optimization and feature selection have been proposed to reduce processing overheads with guaranteeing high detection rates. However, the previous approaches have not taken into account variable importance and optimal number of features and there are no approaches using both of them together so far. In this paper, we propose an optimal spam detection model based on Random Forests (RF) which enables parameters optimization and feature selection. We optimize two parameters of RF to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection, (ii) parameters optimization in every feature elimination phase. We carry out experiments on the Spambase dataset and show the feasibility of our approach.
Keywords :
optimisation; security of data; unsolicited e-mail; Spambase dataset; feature selection; parameters optimization; random forests; spam detection; spyware agents; virus attachments; Aerospace engineering; Competitive intelligence; Computer vision; Intelligent agent; Machine learning algorithms; Optimization methods; Radio frequency; Support vector machine classification; Support vector machines; Unsolicited electronic mail; Feature Selection; Intrusion Detection; Parameters Optimization; Random Forests; Spam Detection; Spambase;
Conference_Titel :
Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on
Conference_Location :
Krakow
Print_ISBN :
978-1-4244-5917-9
DOI :
10.1109/CISIS.2010.116