Title :
Analysis of the evolution of features in classification problems with concept drift: Application to spam detection
Author :
Henke, Marcia ; Souto, Eduardo ; dos Santos, Eulanda M.
Author_Institution :
Inst. of Comput., Fed. Univ. of Amazonas - Manaus, Manaus, Brazil
Abstract :
Machine Learning solutions for concept drift detection problems try to decide to what extent a particular set of examples still represents the current concept rather than treating all data equally. Monitoring the set of relevant features used to generate the classification model may be an effective strategy for concept drift detection. This paper focuses on analyzing the possibility of detecting drifts through feature evolution monitoring in the spam detection problem. Results of the experiments show that the relevant features of the target domain are significantly different from the relevant features of the source domain. This offers a new possibility for analyzing the relationship between feature evolution and misclassification rate. The experiments were conducted using two databases: a public database composed of samples collected between 2003 and 2004; and a new private database composed of samples collected between 2012 and 2013.
Keywords :
learning (artificial intelligence); pattern classification; unsolicited e-mail; classification problems; concept drift detection problems; feature evolution monitoring; machine learning; misclassification rate; spam detection problem; Databases; Error analysis; Feature extraction; Monitoring; Training; Unsolicited electronic mail; concept drift; feature evolution; spam;
Conference_Titel :
Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on
Conference_Location :
Ottawa, ON
DOI :
10.1109/INM.2015.7140398