DocumentCode
651574
Title
Malware Similarity Identification Using Call Graph Based System Call Subsequence Features
Author
Blokhin, Kristina ; Saxe, Joshua ; Mentis, David
Author_Institution
Invincea Inc., Fairfax, VA, USA
fYear
2013
fDate
8-11 July 2013
Firstpage
6
Lastpage
10
Abstract
Recent literature has proposed approaches to detect code-sharing relationships between malware artifacts, which helps to accelerate the malware reverse engineering process. In this paper we propose a novel code-sharing analysis technique that can complement existing methods. Our algorithm partitions malware system call logs into system call subsequences by identifying places in these logs where the set of saved instruction pointers on the program call stack changes significantly. The extracted subsequences thus reflect subsequences of system calls that occur in local regions of the program call graph. Having extracted subsequences, we then use the subsequences as features for computing a malware sample similarity matrix. A unique contribution of our method is that it incorporates sequence information into the features it uses to perform similarity analysis, but unlike previously proposed longest common substring methods it runs in linear time. Similarly, our method incorporates call stack information into its features but is computationally far more tractable than previously proposed call graph isomorphism techniques. Because we extract information from sample behavior logs, we avoid the problem of obfuscated samples resistant to static analysis tools. We have evaluated our method on a corpus of 959 samples and achieve high precision given known malware family labels.
Keywords
computational complexity; directed graphs; invasive software; matrix algebra; program diagnostics; reverse engineering; call graph based system call subsequence features; call graph isomorphism techniques; call stack information; code-sharing analysis technique; code-sharing relationship detection; instruction pointers; linear time; malware artifacts; malware family labels; malware reverse engineering process; malware sample similarity matrix; malware similarity identification; malware system call log partitioning; program call graph; program call stack; sequence information; similarity analysis; static analysis tools; Algorithm design and analysis; Clustering algorithms; Feature extraction; Heuristic algorithms; Internet; Malware; Semantics; Call Graph; Malware; Sequence; Similarity; behavior; dynamic analysis; identification;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing Systems Workshops (ICDCSW), 2013 IEEE 33rd International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
978-1-4799-3247-4
Type
conf
DOI
10.1109/ICDCSW.2013.55
Filename
6679854
Link To Document