• DocumentCode
    624950
  • Title

    mvHash-B - A New Approach for Similarity Preserving Hashing

  • Author

    Breitinger, Frank ; Astebøl, Knut Petter ; Baier, Harald ; Busch, Christoph

  • Author_Institution
    da/sec - Biometrics & Internet Security Res. Group, Hochschule Darmstadt, Darmstadt, Germany
  • fYear
    2013
  • fDate
    12-14 March 2013
  • Firstpage
    33
  • Lastpage
    44
  • Abstract
    The handling of hundreds of thousands of files is a major challenge in today´s IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes.
  • Keywords
    cryptography; data compression; data structures; digital forensics; fingerprint identification; Bloom filter; IT forensic investigation; SHA-1; SPH algorithm; blacklist; data compression; duplicate detection; file identification; fingerprint; hash value length; information overload; majority voting; mvHash-B; run length encoding; similarity preserving hashing; whitelist; Approximation algorithms; Cryptography; Databases; Encoding; Forensics; Robustness; Bloom filter; Digital forensics; fuzzy hashing; run-length encoding; similarity preserving hashing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on
  • Conference_Location
    Nuremberg
  • Print_ISBN
    978-1-4673-6307-5
  • Type

    conf

  • DOI
    10.1109/IMF.2013.18
  • Filename
    6568552