Title :
A parameter-free hybrid clustering algorithm used for malware categorization
Author :
Han, ZhiXue ; Feng, Shaorong ; Ye, Yanfang ; Jiang, Qingshan
Author_Institution :
Dept. of Comput. Sci., Xiamen Univ., Xiamen, China
Abstract :
Nowadays, numerous attacks made by the malware, such as viruses, backdoors, spyware, trojans and worms, have presented a major security threat to computer users. The most significant line of defense against malware is antivirus products which detects, removes, and characterizes these threats. The ability of these AV products to successfully characterize these threats greatly depends on the method for categorizing these profiles of malware into groups. Therefore, clustering malware into different families is one of the computer security topics that are of great interest. In this paper, resting on the analysis of the extracted instruction of malware samples, we propose a novel parameter-free hybrid clustering algorithm (PFHC) which combines the merits of hierarchical clustering and K-means algorithms for malware clustering. It can not only generate stable initial division, but also give the best K. PFHC first utilizes agglomerative hierarchical clustering algorithm as the frame, starting with N singleton clusters, each of which exactly includes one sample, then reuses the centroids of upper level in every level and merges the two nearest clusters, finally adopts K-means algorithm for iteration to achieve an approximate global optimal division. PFHC evaluates clustering validity of each iteration procedure and generates the best K by comparing the values. The promising studies on real daily data collection illustrate that, compared with popular existing K-means and hierarchical clustering approaches, our proposed PFHC algorithm always generates much higher quality clusters and it can be well used for malware categorization.
Keywords :
invasive software; pattern clustering; statistical analysis; K-means algorithms; antivirus products; hierarchical clustering; malware categorization; parameter free hybrid clustering algorithm; Algorithm design and analysis; Clustering algorithms; Computer science; Computer security; Computer viruses; Computer worms; Invasive software; Laboratories; Partitioning algorithms; Software algorithms; Hierarchical clustering; K-means; Malware categorization; Parameter-Free Hybrid Clustering (PFHC);
Conference_Titel :
Anti-counterfeiting, Security, and Identification in Communication, 2009. ASID 2009. 3rd International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-3883-9
Electronic_ISBN :
978-1-4244-3884-6
DOI :
10.1109/ICASID.2009.5276982