DocumentCode :
2335259
Title :
FIExPat: flexible extraction of sequential patterns
Author :
Rolland, Pierre-Yves
Author_Institution :
Lab. d´´Informatique de Paris 6, Univ. d´´Aix-Marseille III, Marseille, France
fYear :
2001
fDate :
2001
Firstpage :
481
Lastpage :
488
Abstract :
This paper addresses sequential data mining, a sub-area of data mining where the data to be analyzed is organized in sequences. In many problem domains a natural ordering exists over data. Examples of sequential databases (SDBs) include: (a) collections of temporal data sequences, such as chronological series of daily stock indices or multimedia data (sound, music, video, etc.); and (b) macromolecule banks, where amino acid or proteic sequences are represented as strings. In a SDB it is often valuable to detect regularities through one or several sequences. In particular, finding exact or approximate repetitions of segments can be utilized directly (e.g. for determining the biochemical activity of a protein region) or indirectly, e.g. for prediction in finance. To this end, we present concepts and an algorithm for automatically extracting sequential patterns from a sequential database. Such a pattern is defined as a group of significantly similar segments from one or several sequences. Appropriate functions for measuring similarity between sequence segments are proposed, generalizing the edit distance framework. There is a trade off between flexibility, particularly in sequence data representation and in associated similarity metrics, and computational efficiency. We designed the FlExPat algorithm to satisfactorily cope with this trade-off. FlExPat´s complexity is in practice lesser than quadratic in the total length of the SDB analyzed, while allowing high flexibility. Some experimental results obtained with FlExPat on music data are presented and commented
Keywords :
data mining; multimedia databases; music; pattern recognition; sequences; FlExPat; amino acid sequences; chronological series; computational efficiency; daily stock indices; flexible sequential pattern extraction; macromolecule banks; multimedia data; music data; proteic sequences; regularity detection; segment repetition; sequence segment similarity; sequences; sequential data mining; sequential databases; similarity metrics; strings; temporal data sequences; Algorithm design and analysis; Amino acids; Computational efficiency; Data analysis; Data mining; Finance; Multimedia databases; Music; Proteins; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
Type :
conf
DOI :
10.1109/ICDM.2001.989555
Filename :
989555
Link To Document :
بازگشت