Title :
Malware Similarity Identification Using Call Graph Based System Call Subsequence Features
Author :
Blokhin, Kristina ; Saxe, Joshua ; Mentis, David
Author_Institution :
Invincea Inc., Fairfax, VA, USA
Abstract :
Recent literature has proposed approaches to detect code-sharing relationships between malware artifacts, which helps to accelerate the malware reverse engineering process. In this paper we propose a novel code-sharing analysis technique that can complement existing methods. Our algorithm partitions malware system call logs into system call subsequences by identifying places in these logs where the set of saved instruction pointers on the program call stack changes significantly. The extracted subsequences thus reflect subsequences of system calls that occur in local regions of the program call graph. Having extracted subsequences, we then use the subsequences as features for computing a malware sample similarity matrix. A unique contribution of our method is that it incorporates sequence information into the features it uses to perform similarity analysis, but unlike previously proposed longest common substring methods it runs in linear time. Similarly, our method incorporates call stack information into its features but is computationally far more tractable than previously proposed call graph isomorphism techniques. Because we extract information from sample behavior logs, we avoid the problem of obfuscated samples resistant to static analysis tools. We have evaluated our method on a corpus of 959 samples and achieve high precision given known malware family labels.
Keywords :
computational complexity; directed graphs; invasive software; matrix algebra; program diagnostics; reverse engineering; call graph based system call subsequence features; call graph isomorphism techniques; call stack information; code-sharing analysis technique; code-sharing relationship detection; instruction pointers; linear time; malware artifacts; malware family labels; malware reverse engineering process; malware sample similarity matrix; malware similarity identification; malware system call log partitioning; program call graph; program call stack; sequence information; similarity analysis; static analysis tools; Algorithm design and analysis; Clustering algorithms; Feature extraction; Heuristic algorithms; Internet; Malware; Semantics; Call Graph; Malware; Sequence; Similarity; behavior; dynamic analysis; identification;
Conference_Titel :
Distributed Computing Systems Workshops (ICDCSW), 2013 IEEE 33rd International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4799-3247-4
DOI :
10.1109/ICDCSW.2013.55