Defining malware families based on analyst insights

Author

Gennari, Jeff ; French, David

Author_Institution

CERT Program, Carnegie Mellon Univ., Pittsburgh, PA, USA

fYear

2011

fDate

15-17 Nov. 2011

Firstpage

396

Lastpage

401

Abstract

Determining whether arbitrary files are related to known malicious files is often useful in network and host-based defense. Doing so can give network defenders sufficient exemplars of a particular threat to develop comprehensive signatures and heuristics for identifying the threat, leading to decreased response time and improved prevention of a cyber attack. Identifying these malicious families is a complex process involving the categorization of potentially malicious code into sets that share similar features, while being distinguishable from unrelated threats or non-malicious code. Current methods for automatically or manually describing malware families are typically unable to distinguish between indicators derived from the structure of the malware and indicators derived from the behavior of the malware. Further, attempts to cluster potentially related files by mapping them into alternate domains, including histograms, fuzzy hashes, Bloom filters, and so on often produces clusters of files solely derived from structural information. These similarity measurements are often very effective on crudely similar files, yet they fail to identify files that have similar or identical behavior and semantics. We propose an analytic method, driven largely by human experience and based on objective criteria, for assigning arbitrary files membership in a malicious code family. We describe a process for iteratively refining the criteria used to select a malicious code family, until such criteria described are both necessary and sufficient to distinguish a particular malicious code family. We contrast this process with similar processes, such as antivirus signature generation and automatic and blind classification methods. We formalize this process to describe a roadmap for practitioners of malicious code analysis and to highlight opportunities for improvement and automation of both the process and the observation of relevant criteria. Finally, we provide experimental results of- applying this methodology to real-world malware.

Keywords

computer viruses; Bloom filters; analyst insights; analytic method; antivirus signature generation; automatic classification methods; blind classification methods; cyber attack; fuzzy hash; histograms; host-based defense; malicious code family; malicious files; malware family definition; network defenders; threat identification; Algorithm design and analysis; Humans; Implants; Malware; Reverse engineering; Runtime;

fLanguage

English

Publisher

ieee

Conference_Titel

Technologies for Homeland Security (HST), 2011 IEEE International Conference on

Conference_Location

Waltham, MA

Print_ISBN

978-1-4577-1375-0

Type

conf

DOI

10.1109/THS.2011.6107902

Filename

6107902