DocumentCode
1625411
Title
Efficiently Evaluating Order Preserving Similarity Queries over Historical Market-Basket Data
Author
Sherkat, Reza ; Rafiei, Davood
Author_Institution
University of Alberta
fYear
2006
Firstpage
19
Lastpage
19
Abstract
We introduce a new domain-independent framework for formulating and efficiently evaluating similarity queries over historical data, where given a history as a sequence of timestamped observations and the pair-wise similarity of observations, we want to find similar histories. For instance, given a database of customer transactions and a time period, we can find customers with similar purchasing behaviors over this period. Our work is different from the work on retrieving similar time series; it addresses the general problem in which a history cannot be modeled as a time series, hence the relevant conventional approaches are not applicable. We derive a similarity measure for histories, based on an aggregation of the similarities between the observations of the two histories, and propose efficient algorithms for finding an optimal alignment between two histories. Given the non-metric nature of our measure, we develop some upper bounds and an algorithm that makes use of those bounds to prune histories that are guaranteed not to be in the answer set. Our experimental results on real and synthetic data confirm the effectiveness and efficiency of our approach. For instance, when the minimum length of a match is provided, our algorithm achieves up to an order of magnitude speed-up over alternative methods.
Keywords
Blood; Data mining; History; Hospitals; Hydrogen; Internet; Transaction databases; Upper bound; Warehousing; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
Print_ISBN
0-7695-2570-9
Type
conf
DOI
10.1109/ICDE.2006.59
Filename
1617387
Link To Document