• DocumentCode
    651574
  • Title

    Malware Similarity Identification Using Call Graph Based System Call Subsequence Features

  • Author

    Blokhin, Kristina ; Saxe, Joshua ; Mentis, David

  • Author_Institution
    Invincea Inc., Fairfax, VA, USA
  • fYear
    2013
  • fDate
    8-11 July 2013
  • Firstpage
    6
  • Lastpage
    10
  • Abstract
    Recent literature has proposed approaches to detect code-sharing relationships between malware artifacts, which helps to accelerate the malware reverse engineering process. In this paper we propose a novel code-sharing analysis technique that can complement existing methods. Our algorithm partitions malware system call logs into system call subsequences by identifying places in these logs where the set of saved instruction pointers on the program call stack changes significantly. The extracted subsequences thus reflect subsequences of system calls that occur in local regions of the program call graph. Having extracted subsequences, we then use the subsequences as features for computing a malware sample similarity matrix. A unique contribution of our method is that it incorporates sequence information into the features it uses to perform similarity analysis, but unlike previously proposed longest common substring methods it runs in linear time. Similarly, our method incorporates call stack information into its features but is computationally far more tractable than previously proposed call graph isomorphism techniques. Because we extract information from sample behavior logs, we avoid the problem of obfuscated samples resistant to static analysis tools. We have evaluated our method on a corpus of 959 samples and achieve high precision given known malware family labels.
  • Keywords
    computational complexity; directed graphs; invasive software; matrix algebra; program diagnostics; reverse engineering; call graph based system call subsequence features; call graph isomorphism techniques; call stack information; code-sharing analysis technique; code-sharing relationship detection; instruction pointers; linear time; malware artifacts; malware family labels; malware reverse engineering process; malware sample similarity matrix; malware similarity identification; malware system call log partitioning; program call graph; program call stack; sequence information; similarity analysis; static analysis tools; Algorithm design and analysis; Clustering algorithms; Feature extraction; Heuristic algorithms; Internet; Malware; Semantics; Call Graph; Malware; Sequence; Similarity; behavior; dynamic analysis; identification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems Workshops (ICDCSW), 2013 IEEE 33rd International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4799-3247-4
  • Type

    conf

  • DOI
    10.1109/ICDCSW.2013.55
  • Filename
    6679854