DocumentCode :
2081142
Title :
Efficient processing of substring match queries with inverted q-gram indexes
Author :
Kim, Younghoon ; Woo, Kyoung-Gu ; Park, Hyoungmin ; Kyuseok Shim
Author_Institution :
Seoul Nat. Univ., Seoul, South Korea
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
721
Lastpage :
732
Abstract :
With the widespread of the internet, text-based data sources have become ubiquitous and the demand of effective support for string matching queries becomes ever increasing. The relational query language SQL also supports LIKE clause over string data to handle substring matching queries. Due to popularity of such substring matching queries, there have been a lot of study on designing efficient indexes to support the LIKE clause in SQL. Among them, q-gram based indexes have been studied extensively. However, how to process substring matching queries efficiently with such indexes has received very little attention until recently. In this paper, we show that the optimal execution of intersecting posting lists of q-grams for substring matching queries should be decided judiciously. Then we present the optimal and approximate algorithms based on cost estimation for substring matching queries. Performance study confirms that our techniques improve query execution time with q-gram indexes significantly compared to the traditional algorithms.
Keywords :
SQL; query processing; string matching; text analysis; SQL; approximate algorithms; cost estimation; internet; inverted q-gram indexes; optimal algorithms; query execution time; relational query language; string data; substring match queries efficient processing; text-based data sources; Algorithm design and analysis; Cost function; Data structures; Database languages; Internet; Intrusion detection; Query processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447866
Filename :
5447866
Link To Document :
بازگشت