Efficiently Evaluating Order Preserving Similarity Queries over Historical Market-Basket Data

Author

Sherkat, Reza ; Rafiei, Davood

Author_Institution

University of Alberta

fYear

2006

Firstpage

19

Lastpage

19

Abstract

We introduce a new domain-independent framework for formulating and efficiently evaluating similarity queries over historical data, where given a history as a sequence of timestamped observations and the pair-wise similarity of observations, we want to find similar histories. For instance, given a database of customer transactions and a time period, we can find customers with similar purchasing behaviors over this period. Our work is different from the work on retrieving similar time series; it addresses the general problem in which a history cannot be modeled as a time series, hence the relevant conventional approaches are not applicable. We derive a similarity measure for histories, based on an aggregation of the similarities between the observations of the two histories, and propose efficient algorithms for finding an optimal alignment between two histories. Given the non-metric nature of our measure, we develop some upper bounds and an algorithm that makes use of those bounds to prune histories that are guaranteed not to be in the answer set. Our experimental results on real and synthetic data confirm the effectiveness and efficiency of our approach. For instance, when the minimum length of a match is provided, our algorithm achieves up to an order of magnitude speed-up over alternative methods.

Keywords

Blood; Data mining; History; Hospitals; Hydrogen; Internet; Transaction databases; Upper bound; Warehousing; Web pages;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on

Print_ISBN

0-7695-2570-9

Type

conf

DOI

10.1109/ICDE.2006.59

Filename

1617387