DocumentCode
179808
Title
An index structure for similarity join based on high-frequency queries
Author
Kunanusont, Kamolwan ; Chongstitvatana, Jaruloj
Author_Institution
Dept. of Math. & Comput. Sci., Chulalongkorn Univ., Bangkok, Thailand
fYear
2014
fDate
July 30 2014-Aug. 1 2014
Firstpage
415
Lastpage
420
Abstract
Strings databases are widely used in many applications these days. Searching for texts which are similar to query texts is necessary. Similarity join finds pairs of texts whose similarity exceeds a given threshold. Many researches have been done to reduce the time for similarity join. The filter-and-verify framework is one approach which first filters out dissimilar pairs of text and then verifies the remaining pairs. Prefix filtering is a filter-and-verify method which eliminates dissimilar pairs of texts by comparing only prefixes of the texts. However, these algorithms for similarity join disregard the frequencies of queries. Based on the data collected from Google trends explorer, some queries appear with higher frequency. This paper aims to reduce the running time for similarity join by focusing on these high-frequency queries. Based on these high-frequency queries, indices are created to facilitate these queries and any queries which are similar to them. The proposed indices and similarity join algorithm are implemented to evaluate its performance. Experiments show that the proposed method outperforms a leading similarity join algorithm - AdaptSearch - when queries are similar to a high-frequency query.
Keywords
indexing; information filtering; query processing; text analysis; AdaptSearch similarity join algorithm; Google trends explorer; filter-and-verify framework; high-frequency queries; index structure; performance evaluation; prefix filtering; query texts; strings databases; Algorithm design and analysis; Computer science; Filtering; Filtering algorithms; Indexes; Time-frequency analysis; High-frequency queries; Prefix filtering; Similarity join;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Engineering Conference (ICSEC), 2014 International
Conference_Location
Khon Kaen
Print_ISBN
978-1-4799-4965-6
Type
conf
DOI
10.1109/ICSEC.2014.6978233
Filename
6978233
Link To Document