Title :
TLSH -- A Locality Sensitive Hash
Author :
Oliver, J. ; Chun Cheng ; Yanggui Chen
Author_Institution :
Trend Micro, North Ryde, NSW, Australia
Abstract :
Cryptographic hashes such as MD5 and SHA-1 are used for many data mining and security applications -- they are used as an identifier for files and documents. However, if a single byte of a file is changed, then cryptographic hashes result in a completely different hash value. It would be very useful to work with hashes which identify that files were similar based on their hash values. The security field has proposed similarity digests, and the data mining community has proposed locality sensitive hashes. Some proposals include the Nilsimsa hash (a locality sensitive hash), Ssdeep and Sdhash (both Ssdeep and Sdhash are similarity digests). Here, we describe a new locality sensitive hashing scheme the TLSH. We provide algorithms for evaluating and comparing hash values and provide a reference to its open source code. We do an empirical evaluation of publically available similarity digest schemes. The empirical evaluation highlights significant problems with previously proposed schemes; the TLSH scheme does not suffer from the flaws identified.
Keywords :
cryptography; public domain software; MD5; Nilsimsa hash; SHA-1; Sdhash; Ssdeep; TLSH; cryptographic hashes; data mining community; locality sensitive hashing scheme; open source code; security applications; security field; similarity digests; Arrays; Cryptography; Data mining; Electronic mail; Hamming distance; Malware; Locality sensitive hash; Nilsimsa; Sdhash; Ssdeep; TLSH.; data fingerprinting; fuzzy hashing; similarity digests;
Conference_Titel :
Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth
Conference_Location :
Sydney NSW
Print_ISBN :
978-1-4799-3075-3