DocumentCode :
3523612
Title :
Hamming distance based approximate similarity text search algorithm
Author :
Haifeng Hu ; Liang Zhang ; Jianshen Wu
Author_Institution :
Dept. of Telecommun. & Inf. Eng., Nanjing Univ. of Posts & Telecommun., Nanjing, China
fYear :
2015
fDate :
27-29 March 2015
Firstpage :
1
Lastpage :
6
Abstract :
We propose a Hamming distance based approximate similarity text search (HASTS) algorithm to improve the quality of queries in massive text data. The HASTS algorithm first constructs an index table with the substrings extracted randomly from the feature fingerprints generated by the SimHash algorithm. Then, it assigns weights to text terms to reduce the size of the candidate set. Furthermore, the final query result can be obtained by comparing the Hamming distance between the query term and the text terms in the candidate set. Finally, Extensive simulations are conducted to analysis the influence of different parameters on query performance of the HASTS algorithm and compare its performance with the existing search algorithm. The results show that the HASTS algorithm can satisfy the query requirements in massive text data with relatively low overheads.
Keywords :
query processing; text analysis; HASTS algorithm; Hamming distance; SimHash algorithm; approximate similarity text search algorithm; feature fingerprints; index table; massive text data query; query performance; query quality; query requirements; query term; Electronic publishing; Fingerprint recognition; Indexes; Information services; Internet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computational Intelligence (ICACI), 2015 Seventh International Conference on
Conference_Location :
Wuyi
Print_ISBN :
978-1-4799-7257-9
Type :
conf
DOI :
10.1109/ICACI.2015.7184772
Filename :
7184772
Link To Document :
بازگشت