DocumentCode :
1965211
Title :
Efficient fuzzy search enabled hash map
Author :
Topac, V.
Author_Institution :
Dept. of Comput. Sci. & Inf. Technol., Politeh. Univ. of Timisoara, Timisoara, Romania
fYear :
2010
fDate :
15-17 July 2010
Firstpage :
39
Lastpage :
44
Abstract :
Hash maps are data structures widely used in modern programming languages like Java for their simplicity and efficiency. When fuzzy string search is needed (like in natural language processing) finding an approximate key match in a regular Java HashMap is a trivial task. It usually requires the brute force method of iterating trough the set of keys and use of string metrics methods. Although this approach works it is time consuming and loses the hashing advantage of the hash map. Another option is to use a different data structure like TreeMap, which is faster, but also have limitations on fuzzy string search. This article presents FuzzyHashMap, an extension to the regular Java HashMap data structure allowing highly efficient fuzzy string key search. Based on object oriented principles this extended hash map uses a custom key that enables different types of pre-hashing functions and different types of dynamic programming algorithms for approximate string matching. Customizable algorithms and settings bring flexibility to this new data structure, making it adaptable to each specific use case. Fuzzy string search performance comparison between FuzzyHashMap and the regular HashMap are presented for both accuracy and time consumption. Results show very good performance for FuzzyHashMap compared to the regular HashMap. Some real use cases for the extended hash map are listed. All the work described in this paper is released as open source, making it easy for the community to use and extend the capabilities of the current implementation.
Keywords :
Java; data structures; dynamic programming; fuzzy set theory; public domain software; string matching; FuzzyHashMap; Java; TreeMap; data structures; dynamic programming algorithms; fuzzy string search; natural language processing; open source; programming languages; string matching; string metrics methods; Approximation algorithms; Communities; Hash Map; Java; approximate string matching; fuzzy search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Soft Computing Applications (SOFA), 2010 4th International Workshop on
Conference_Location :
Arad
Print_ISBN :
978-1-4244-7985-6
Type :
conf
DOI :
10.1109/SOFA.2010.5565628
Filename :
5565628
Link To Document :
بازگشت