DocumentCode :
1070582
Title :
A General Model for Sequential Pattern Mining with a Progressive Database
Author :
Huang, Jen-Wei ; Tseng, Chi-Yao ; Ou, Jian-Chih ; Chen, Ming-Syan
Author_Institution :
Nat. Taiwan Univ., Taipei
Volume :
20
Issue :
9
fYear :
2008
Firstpage :
1153
Lastpage :
1167
Abstract :
Although there have been many recent studies on the mining of sequential patterns in a static database and in a database with increasing data, these works, in general, do not fully explore the effect of deleting old data from the sequences in the database. When sequential patterns are generated, the newly arriving patterns may not be identified as frequent sequential patterns due to the existence of old data and sequences. Even worse, the obsolete sequential patterns that are not frequent recently may stay in the reported results. In practice, users are usually more interested in the recent data than the old ones. To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted, or deleted. In addition, we present a progressive algorithm Pisa, which stands for progressive mining of sequential patterns, to progressively discover sequential patterns in defined time period of interest (POI). The POI is a sliding window continuously advancing as the time goes by. Pisa utilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of POI, thereby effectively limiting the memory space required by Pisa that is significantly smaller than the memory needed by the alternative method, direct appending (DirApp). Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing start time and end time of the POI, Pisa can easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed. The experimental results show that Pisa not only significantly outperforms the prior methods in- - execution time by orders of magnitude but also possesses graceful scalability.
Keywords :
data mining; database theory; trees (mathematics); DirApp; Pisa; direct appending; incremental database; progressive database; progressive sequential tree; sequential pattern discovery; sequential pattern mining; static database; Sequential Pattern; data mining; progressive databases;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2008.37
Filename :
4453822
Link To Document :
بازگشت