DocumentCode :
3079050
Title :
Malware detection by text and data mining
Author :
Sundarkumar, G. Ganesh ; Ravi, Vignesh
Author_Institution :
Center of Excellence in CRM & Analytics, Inst. for Dev. & Res. in Banking Technol., Hyderabad, India
fYear :
2013
fDate :
26-28 Dec. 2013
Firstpage :
1
Lastpage :
6
Abstract :
Cyber frauds are a major security threat to the banking industry worldwide. Malware is one of the manifestations of cyber frauds. Malware authors use Application Programming Interface (API) calls to perpetrate these crimes. In this paper, we propose a static analysis method to detect Malware based on API call sequences using text and data mining in tandem. We analyzed the dataset available at CSMINING group. First, we employed text mining to extract features from the dataset consisting a series of API calls. Further, mutual information is invoked for feature selection. Then, we resorted to over-sampling to balance the data set. Finally, we employed various data mining techniques such as Decision Tree (DT), Multi Layer Perceptron (MLP), Support Vector Machine (SVM), Probabilistic Neural Network (PNN) and Group Method for Data Handling (GMDH). We also applied One Class SVM (OCSVM). Throughout the paper, we used 10-fold cross validation technique for testing the techniques. We observed that SVM and OCSVM achieved 100% sensitivity after balancing the dataset.
Keywords :
application program interfaces; data mining; decision trees; feature extraction; invasive software; neural nets; support vector machines; text analysis; API call sequences; DT; GMDH; MLP; Malware authors; OCSVM; PNN; SVM; application programming interface; cyber frauds; data mining; decision tree; feature extraction; feature selection; group method for data handling; malware detection; multi layer perceptron; one class SVM; probabilistic neural network; security threat; static analysis method; support vector machine; text mining; Accuracy; Feature extraction; Malware; Mutual information; Support vector machines; Text mining; Application Programming Interface calls; Data Mining; Mutual Information; Over Sampling; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Computing Research (ICCIC), 2013 IEEE International Conference on
Conference_Location :
Enathi
Print_ISBN :
978-1-4799-1594-1
Type :
conf
DOI :
10.1109/ICCIC.2013.6724229
Filename :
6724229
Link To Document :
بازگشت