DocumentCode
262037
Title
A Practical Guide for Detecting the Java Script-Based Malware Using Hidden Markov Models and Linear Classifiers
Author
Cosovan, Doina ; Benchea, Razvan ; Gavrilut, Dragos
Author_Institution
Romania Bitdefender Anti-virus Res. Lab., Al.I. Cuza Univ. of Iasi, Iasi, Romania
fYear
2014
fDate
22-25 Sept. 2014
Firstpage
236
Lastpage
243
Abstract
The World Wide Web evolved so rapidly that it is no longer considered a luxury, but a necessity. That is why currently the most popular infection vectors used by cyber criminals are either web pages or commonly used documents (such as pdf files). In both of these cases, the malicious actions performed are written in Java Script. Because of this, Java Script has become the preferred language for spreading malware. In order to be able to stop malicious content from executing, detection of its infection vector is crucial. In this paper we propose various methods for detecting Java Script-based attack vectors. For achieving our goal we first need to fight metamorphism techniques usually used in Java Script malicious code, which are by no means trivial: garbage instruction insertion, variable renaming, equivalent instruction substitution, function permutation, instruction reordering, and so on. Our approach to deal with metamorphism starts with splitting the Java Script content in components and filtering the insignificant ones. We then use a data set, consisting in over one million Java Script files in order to test several machine learning algorithms such as Hidden Markov Models, linear classifiers and hybrid approaches for malware detection. Finally, we analyze these detection methods from a practical point of view, emphasizing the need for a very low false positive rate and the ability to be trained on large datasets.
Keywords
Java; Web sites; hidden Markov models; invasive software; learning (artificial intelligence); pattern classification; vectors; JavaScript content; JavaScript files; JavaScript malicious code; JavaScript-Based malware detection; JavaScript-based attack vector detection; Web pages; World Wide Web; cybercriminals; equivalent instruction substitution; function permutation; garbage instruction insertion; hidden Markov models; infection vectors; instruction reordering; linear classifiers; machine learning algorithms; metamorphism techniques; variable renaming; Feature extraction; HTML; Hidden Markov models; Malware; Portable document format; Reactive power; Vectors; Hidden Markov Model; Java Script; Linear Classifier; Machine Learning; PDF; detection; infection vector; malware; metamorphism;
fLanguage
English
Publisher
ieee
Conference_Titel
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2014 16th International Symposium on
Conference_Location
Timisoara
Print_ISBN
978-1-4799-8447-3
Type
conf
DOI
10.1109/SYNASC.2014.39
Filename
7034689
Link To Document