Author :
Moon, Yang-Sae ; Whang, Kyu-Young ; Loh, Woong-Kee
Author_Institution :
Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Taejon, South Korea
Abstract :
The authors propose a subsequence matching method, Dual Match, which exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by C. Faloutsos et al. (1994), which divides data sequences into sliding windows and the query sequence into disjoint windows. We formally prove that our dual approach is correct, i.e., it incurs no false dismissal. We also prove that, given the minimum query length, there is a maximum bound of the window size to guarantee correctness of Dual Match and discuss the effect of the window size on performance. FRM causes a lot of false alarms by storing minimum bounding rectangles rather than individual points representing windows to avoid excessive storage space required for the index. Dual Match solves this problem by directly storing points, but without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement in both false alarms and performance over FRM, given the same amount of storage space. In particular, for low selectivities (less than 10-4), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities(more than 10-2), it shows a very minor degradation (less than 29%). For selectivities in between (10-4~10-2), Dual Match shows performance slightly better than that of FRM. Dual Match is also 4.10~25.6 times faster than FRM in building indexes of approximately the same size. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications
Keywords :
pattern matching; query processing; sequences; temporal databases; time series; Dual Match; FRM; data sequences; disjoint windows; duality based subsequence matching; false dismissal; large database applications; maximum bound; minimum bounding rectangles; minimum query length; query sequence; sliding windows; subsequence matching method; time series databases; window size; Biomedical measurements; Computer science; Data mining; Databases; Degradation; Euclidean distance; Exchange rates; Information technology; Moon; Tin;