Title :
Hierarchical management of large-scale malware data
Author :
Kellogg, Lee ; Ruttenberg, Brian ; O´Connor, Alexander ; Howard, Michael ; Pfeffer, Avi
Author_Institution :
Charles River Analytics, Cambridge, MA, USA
Abstract :
As the pace of generation of new malware accelerates, clustering and classifying newly discovered malware requires new approaches to data management. We describe our Big Data approach to managing malware to support effective and efficient malware analysis on large and rapidly evolving sets of malware. The key element of our approach is a hierarchical organization of the malware, which organizes malware into families, maintains a rich description of the relationships between malware, and facilitates efficient online analysis of new malware as they are discovered. Using clustering evaluation metrics, we show that our system discovers malware families comparable to those produced by traditional hierarchical clustering algorithms, while scaling much better with the size of the data set. We also show the flexibility of our system as it relates to substituting various data representations, methods of comparing malware binaries, clustering algorithms, and other factors. Our approach will enable malware analysts and investigators to quickly understand and quantify changes in the global malware ecosystem.
Keywords :
Big Data; data analysis; invasive software; pattern classification; pattern clustering; Big Data approach; clustering evaluation metrics; data representation; global malware ecosystem; hierarchical clustering algorithm; hierarchical data management; hierarchical malware organization; large-scale malware data; malware analysis; malware binaries; malware classification; malware clustering; malware geneation; Algorithm design and analysis; Big data; Clustering algorithms; Databases; Malware; Organizations;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004290