DocumentCode
2220140
Title
A scalable algorithm for mining maximal frequent sequences using sampling
Author
Luo, Congnan ; Chung, Soon M.
Author_Institution
Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
fYear
2004
fDate
15-17 Nov. 2004
Firstpage
156
Lastpage
165
Abstract
We propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency based pruning and the supersequence frequency based pruning are applied. In MSPS, sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better performance. This method makes sampling more efficient when the minimum support is small. A signature technique is utilized for the subsequence infrequency based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms.
Keywords
data mining; database management systems; pattern recognition; tree data structures; MSPS mining; candidate generation; customer sequence trimming; database sample; database scanning; maximal sequential patterns; prefix tree structure; sampling technique; scalable algorithm; subsequence infrequency based pruning; supersequence frequency based pruning; user-specified minimum support level; Association rules; Computer science; Costs; Data analysis; Data mining; Databases; Frequency; Sampling methods; Scalability; Tree data structures;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
ISSN
1082-3409
Print_ISBN
0-7695-2236-X
Type
conf
DOI
10.1109/ICTAI.2004.16
Filename
1374182
Link To Document