DocumentCode :
477820
Title :
Mining Sequential Pattern Using DF2Ls
Author :
Yusheng, Xu ; Lanhui, Zhang ; Zhixin, Ma ; Lian, Li ; Chen, Xiaoyun ; Dillon, Tharam S.
Author_Institution :
Sch. of Inf. Sci. & Technol., Lanzhou Univ., Lanzhou
Volume :
2
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
600
Lastpage :
604
Abstract :
In this paper, based on SEP and IEP proposed in our previous work, we present two novel pruning strategies, DSEP (dynamic sequence extension pruning) and DIEP (dynamic item extension pruning), which can be used in all Apriori-like sequence mining algorithms or lattice-theoretic approaches. DSEP/DIEP uses DF2Ls (Dynamic Frequent 2-Sequence Lists), which is built by previous enumerations, to prune out infrequent candidate sequences during mining process. With a little more memory overhead, proposed pruning strategies can prune invalidated search space and decrease the total cost of frequency counting effectively. For effectiveness testing reason, we optimize SPAM by using proposed pruning strategies and present the improved algorithm, SPAM+, which uses DSEP and DIEP to prune the search space of SPAM by sharing dynamic frequent 2-sequences lists. A comprehensive performance experiments study shows that SPAM+ outperforms SPAM by a factor of 10 on small datasets and better than 35% to 58% on reasonably large dataset.
Keywords :
data mining; Apriori-like sequence mining algorithms; IEP; SEP; SPAM; dynamic frequent 2-sequence Lists; dynamic item extension pruning; dynamic sequence extension pruning; lattice theory; pruning strategies; sequential pattern mining; Costs; Data mining; Databases; Electronic mail; Frequency; Itemsets; Sequences; Space exploration; Testing; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.29
Filename :
4666187
Link To Document :
بازگشت