Title :
An index structure for similarity join based on high-frequency queries
Author :
Kunanusont, Kamolwan ; Chongstitvatana, Jaruloj
Author_Institution :
Dept. of Math. & Comput. Sci., Chulalongkorn Univ., Bangkok, Thailand
fDate :
July 30 2014-Aug. 1 2014
Abstract :
Strings databases are widely used in many applications these days. Searching for texts which are similar to query texts is necessary. Similarity join finds pairs of texts whose similarity exceeds a given threshold. Many researches have been done to reduce the time for similarity join. The filter-and-verify framework is one approach which first filters out dissimilar pairs of text and then verifies the remaining pairs. Prefix filtering is a filter-and-verify method which eliminates dissimilar pairs of texts by comparing only prefixes of the texts. However, these algorithms for similarity join disregard the frequencies of queries. Based on the data collected from Google trends explorer, some queries appear with higher frequency. This paper aims to reduce the running time for similarity join by focusing on these high-frequency queries. Based on these high-frequency queries, indices are created to facilitate these queries and any queries which are similar to them. The proposed indices and similarity join algorithm are implemented to evaluate its performance. Experiments show that the proposed method outperforms a leading similarity join algorithm - AdaptSearch - when queries are similar to a high-frequency query.
Keywords :
indexing; information filtering; query processing; text analysis; AdaptSearch similarity join algorithm; Google trends explorer; filter-and-verify framework; high-frequency queries; index structure; performance evaluation; prefix filtering; query texts; strings databases; Algorithm design and analysis; Computer science; Filtering; Filtering algorithms; Indexes; Time-frequency analysis; High-frequency queries; Prefix filtering; Similarity join;
Conference_Titel :
Computer Science and Engineering Conference (ICSEC), 2014 International
Conference_Location :
Khon Kaen
Print_ISBN :
978-1-4799-4965-6
DOI :
10.1109/ICSEC.2014.6978233