• DocumentCode
    134402
  • Title

    Locality-sensitive hashing optimizations for fast malware clustering

  • Author

    Oprisa, Ciprian ; Checiches, Marius ; Nandrean, Adrian

  • Author_Institution
    Bitdefender, Bucharest, Romania
  • fYear
    2014
  • fDate
    4-6 Sept. 2014
  • Firstpage
    97
  • Lastpage
    104
  • Abstract
    Large datasets, including malware collections are difficult to cluster. Although we are mainly dealing with polynomial algorithms, the long running times make them difficult to use in practice. The main issue consists in the fact that the classical hierarchical algorithms need to compute the distance between each pair of items. This paper will show a faster approach for clustering large collections of malware samples using a technique called locality-sensitive hashing. This approach performs single-linkage clustering faster than the state of the art methods, while producing clusters of a similar quality. Although our proposed algorithm is still quadratic in theory, the coefficient for the quadratic term is several orders of magnitude smaller. Our experiments show that we can reduce this coefficient to under 0.02% and still produce clusters 99.9% similar with the ones produced by the single linkage algorithm.
  • Keywords
    cryptography; invasive software; optimisation; pattern clustering; polynomials; locality-sensitive hashing optimization; malware clustering; polynomial algorithm; single-linkage clustering; Algorithm design and analysis; Approximation algorithms; Arrays; Clustering algorithms; Dictionaries; Equations; Malware;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computer Communication and Processing (ICCP), 2014 IEEE International Conference on
  • Conference_Location
    Cluj Napoca
  • Print_ISBN
    978-1-4799-6568-7
  • Type

    conf

  • DOI
    10.1109/ICCP.2014.6936960
  • Filename
    6936960