DocumentCode
1960424
Title
Efficient searches for similar subsequences of different lengths in sequence databases
Author
Park, Sanghyun ; Chu, Wesley W. ; Yoon, Jeehee ; Hsu, Chihcheng
Author_Institution
California Univ., Los Angeles, CA, USA
fYear
2000
fDate
2000
Firstpage
23
Lastpage
32
Abstract
We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our indexing technique uses a disk-based suffix tree as an index structure and employs lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and thus accelerate the query processing, we convert sequences of continuous values to sequences of discrete values via a categorization method and store only a subset of suffixes whose first values are different from their preceding values. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than sequential scanning
Keywords
database indexing; query processing; tree data structures; Euclidean distance; continuous values; database indexing; discrete values; disk-based suffix tree; experimental results; lower-bound distance functions; query processing; sampling rates; sequence databases; sequential scanning; similar subsequence searching; similarity measure; time warping distances; Acceleration; Databases; Euclidean distance; Filters; Indexing; Information retrieval; Length measurement; Query processing; Sampling methods; Time measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2000. Proceedings. 16th International Conference on
Conference_Location
San Diego, CA
ISSN
1063-6382
Print_ISBN
0-7695-0506-6
Type
conf
DOI
10.1109/ICDE.2000.839384
Filename
839384
Link To Document