DocumentCode
134402
Title
Locality-sensitive hashing optimizations for fast malware clustering
Author
Oprisa, Ciprian ; Checiches, Marius ; Nandrean, Adrian
Author_Institution
Bitdefender, Bucharest, Romania
fYear
2014
fDate
4-6 Sept. 2014
Firstpage
97
Lastpage
104
Abstract
Large datasets, including malware collections are difficult to cluster. Although we are mainly dealing with polynomial algorithms, the long running times make them difficult to use in practice. The main issue consists in the fact that the classical hierarchical algorithms need to compute the distance between each pair of items. This paper will show a faster approach for clustering large collections of malware samples using a technique called locality-sensitive hashing. This approach performs single-linkage clustering faster than the state of the art methods, while producing clusters of a similar quality. Although our proposed algorithm is still quadratic in theory, the coefficient for the quadratic term is several orders of magnitude smaller. Our experiments show that we can reduce this coefficient to under 0.02% and still produce clusters 99.9% similar with the ones produced by the single linkage algorithm.
Keywords
cryptography; invasive software; optimisation; pattern clustering; polynomials; locality-sensitive hashing optimization; malware clustering; polynomial algorithm; single-linkage clustering; Algorithm design and analysis; Approximation algorithms; Arrays; Clustering algorithms; Dictionaries; Equations; Malware;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Computer Communication and Processing (ICCP), 2014 IEEE International Conference on
Conference_Location
Cluj Napoca
Print_ISBN
978-1-4799-6568-7
Type
conf
DOI
10.1109/ICCP.2014.6936960
Filename
6936960
Link To Document