Title :
Parallel Hierarchical Clustering on Market Basket Data
Author :
Wang, Baoying ; Ding, Qin ; Rahal, Imad
Author_Institution :
Waynesburg Univ., Waynesburg, PA
Abstract :
Data clustering has been proven to be a promising data mining technique. Recently, there have been many attempts for clustering market-basket data. In this paper, we propose a parallelized hierarchical clustering approach on market-basket data (PH-Clustering), which is implemented using MPI. Based on the analysis of the major clustering steps, we adopt a partial local and partial global approach to decrease the computation time meanwhile keeping communication time at minimum. Load balance issue is always considered especially at data partitioning stage. Our experimental results demonstrate that PH-Clustering speeds up the sequential clustering with a great magnitude. The larger the data size, the more significant the speedup when the number of processors is large. Our results also show that the number of items has more impact on the performance of PH-Clustering than the number of transactions.
Keywords :
data analysis; message passing; parallel algorithms; pattern clustering; resource allocation; MPI; data clustering; data mining; data partitioning; load balance issue; market basket data; parallel hierarchical clustering; parallelized hierarchical clustering; partial global approach; partial local approach; sequential clustering; Conferences; Data analysis; Data mining; Data structures; Decision making; Educational institutions; Itemsets; Message passing; Velocity measurement; Weight measurement; data mining; hierarchical clustering; market basket data; parallel computing;
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
DOI :
10.1109/ICDMW.2008.32