DocumentCode :
1960424
Title :
Efficient searches for similar subsequences of different lengths in sequence databases
Author :
Park, Sanghyun ; Chu, Wesley W. ; Yoon, Jeehee ; Hsu, Chihcheng
Author_Institution :
California Univ., Los Angeles, CA, USA
fYear :
2000
fDate :
2000
Firstpage :
23
Lastpage :
32
Abstract :
We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our indexing technique uses a disk-based suffix tree as an index structure and employs lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and thus accelerate the query processing, we convert sequences of continuous values to sequences of discrete values via a categorization method and store only a subset of suffixes whose first values are different from their preceding values. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than sequential scanning
Keywords :
database indexing; query processing; tree data structures; Euclidean distance; continuous values; database indexing; discrete values; disk-based suffix tree; experimental results; lower-bound distance functions; query processing; sampling rates; sequence databases; sequential scanning; similar subsequence searching; similarity measure; time warping distances; Acceleration; Databases; Euclidean distance; Filters; Indexing; Information retrieval; Length measurement; Query processing; Sampling methods; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2000. Proceedings. 16th International Conference on
Conference_Location :
San Diego, CA
ISSN :
1063-6382
Print_ISBN :
0-7695-0506-6
Type :
conf
DOI :
10.1109/ICDE.2000.839384
Filename :
839384
Link To Document :
بازگشت