DocumentCode
1920957
Title
Pre-clustering based sequential pattern mining
Author
Wu, Shaochun ; Wu, Gengfeng ; Jin, Shenjie
Author_Institution
Sch. of Comput. Sci. & Eng., Shanghai Univ., China
fYear
2004
fDate
14-16 Sept. 2004
Firstpage
1008
Lastpage
1013
Abstract
Sequential pattern mining is increasingly becoming useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of candidate patterns demand efficient and scalable algorithms. In this paper, we present an efficient parallel algorithm named pre-clustering based sequential pattern mining (PCSPM). The algorithm groups sequence data into some clusters according to a similarity definition, and then distribute the clusters to the nodes of distributed memory parallel computer and form some node sets according to the clusters. By limiting the most of communication in each node set, it can greatly reduce the unnecessary communications among parallel computing nodes, and therefore, save much time of communication. The experimental results and the relevant analysis show that PCSPM algorithm is efficient and available.
Keywords
data mining; parallel algorithms; pattern clustering; sequences; PCSPM algorithm; candidate patterns; data distribution; data mining; distributed memory parallel computer; parallel algorithm; parallel computing nodes; preclustering based sequential pattern mining; sequence data; similarity definition; Algorithm design and analysis; Clustering algorithms; Concurrent computing; Distributed computing; Parallel algorithms; Parallel processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and Information Technology, 2004. CIT '04. The Fourth International Conference on
Print_ISBN
0-7695-2216-5
Type
conf
DOI
10.1109/CIT.2004.1357328
Filename
1357328
Link To Document