DocumentCode :
2077401
Title :
Spam Detection Using Feature Selection and Parameters Optimization
Author :
Lee, Sang Min ; Kim, Dong Seong ; Kim, Ji Ho ; Park, Jong Sou
Author_Institution :
Dept. of Comput. Eng., Korea Aerosp. Univ., Seoul, South Korea
fYear :
2010
fDate :
15-18 Feb. 2010
Firstpage :
883
Lastpage :
888
Abstract :
Spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients´ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning algorithms have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should deal with it. For spam detection, parameters optimization and feature selection have been proposed to reduce processing overheads with guaranteeing high detection rates. However, the previous approaches have not taken into account variable importance and optimal number of features and there are no approaches using both of them together so far. In this paper, we propose an optimal spam detection model based on Random Forests (RF) which enables parameters optimization and feature selection. We optimize two parameters of RF to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection, (ii) parameters optimization in every feature elimination phase. We carry out experiments on the Spambase dataset and show the feasibility of our approach.
Keywords :
optimisation; security of data; unsolicited e-mail; Spambase dataset; feature selection; parameters optimization; random forests; spam detection; spyware agents; virus attachments; Aerospace engineering; Competitive intelligence; Computer vision; Intelligent agent; Machine learning algorithms; Optimization methods; Radio frequency; Support vector machine classification; Support vector machines; Unsolicited electronic mail; Feature Selection; Intrusion Detection; Parameters Optimization; Random Forests; Spam Detection; Spambase;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Complex, Intelligent and Software Intensive Systems (CISIS), 2010 International Conference on
Conference_Location :
Krakow
Print_ISBN :
978-1-4244-5917-9
Type :
conf
DOI :
10.1109/CISIS.2010.116
Filename :
5447486
Link To Document :
بازگشت