DocumentCode :
1916470
Title :
Understanding Cloud Data Using Approximate String Matching and Edit Distance
Author :
Jupin, Joseph ; Shi, J.Y. ; Obradovic, Z.
Author_Institution :
CIS Dept., Temple Univ., Philadelphia, PA, USA
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
1234
Lastpage :
1243
Abstract :
For health and human services, fraud detection and other security services, identity resolution is a core requirement for understanding big data in the cloud. Due to the lack of a globally unique identifier and captured typographic differences for the same identity, identity resolution has high spatial and temporal complexities. We propose a filter and verify method to substantially increase the speed of approximate string matching using edit distance. This method has been found to be almost 80 times faster (130 times when combined with other optimizations) than Damerau-Levenshtein edit distance and preserves all approximate matches. Our method creates compressed signatures for data fields and uses Boolean operations and an enhanced bit counter to quickly compare the distance between the fields. This method is intended to be applied to data records whose fields contain relatively short-length strings, such as those found in most demographic data. Without loss of accuracy, the proposed Fast Bitwise Filter will provide substantial performance gain to approximate string comparison in database, record linkage and deduplication data processing systems.
Keywords :
cloud computing; database management systems; digital signatures; string matching; Boolean operation; Damerau-Levenshtein edit distance; approximate string matching; cloud data understanding; compressed signature; database; deduplication data processing system; fast bitwise filter; identity resolution; performance gain; record linkage; security service; Damerau-Levenshtein; Edit Distance Optimization; Filter and Verify; String Comparison;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
Type :
conf
DOI :
10.1109/SC.Companion.2012.149
Filename :
6495931
Link To Document :
بازگشت