Title :
Pre-clustering based sequential pattern mining
Author :
Wu, Shaochun ; Wu, Gengfeng ; Jin, Shenjie
Author_Institution :
Sch. of Comput. Sci. & Eng., Shanghai Univ., China
Abstract :
Sequential pattern mining is increasingly becoming useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of candidate patterns demand efficient and scalable algorithms. In this paper, we present an efficient parallel algorithm named pre-clustering based sequential pattern mining (PCSPM). The algorithm groups sequence data into some clusters according to a similarity definition, and then distribute the clusters to the nodes of distributed memory parallel computer and form some node sets according to the clusters. By limiting the most of communication in each node set, it can greatly reduce the unnecessary communications among parallel computing nodes, and therefore, save much time of communication. The experimental results and the relevant analysis show that PCSPM algorithm is efficient and available.
Keywords :
data mining; parallel algorithms; pattern clustering; sequences; PCSPM algorithm; candidate patterns; data distribution; data mining; distributed memory parallel computer; parallel algorithm; parallel computing nodes; preclustering based sequential pattern mining; sequence data; similarity definition; Algorithm design and analysis; Clustering algorithms; Concurrent computing; Distributed computing; Parallel algorithms; Parallel processing;
Conference_Titel :
Computer and Information Technology, 2004. CIT '04. The Fourth International Conference on
Print_ISBN :
0-7695-2216-5
DOI :
10.1109/CIT.2004.1357328