DocumentCode
2191126
Title
Parallel EM-Clustering: Fast Convergence by Asynchronous Model Updates
Author
Plant, Claudia ; Bohm, Christian
Author_Institution
Florida State Univ., Tallahassee, FL, USA
fYear
2010
fDate
13-13 Dec. 2010
Firstpage
178
Lastpage
185
Abstract
The data explosion in many applications requires efficient data mining solutions. Fortunately, emerging technologies like grid and cloud computing, high-performance multi-core processors and graphics processing units provide the potential to keep pace with the data explosion and open up new opportunities for designing efficient algorithms. In this paper, we propose a parallel variant of the Expectation Maximization (EM) algorithm suitable for clustering large data sets in a distributed environment. The conventional EM algorithm sequentially iterates two phases: In the E-step, points are assigned to the clusters and in the M-step the cluster models are updated. The basic idea of our approach is allowing asynchronous model updates for faster convergence and best usage of the available resources. The frequency of the updates can be flexibly adjusted to the specific characteristics of the environment including communication costs and computing power of the single devices. An extensive experimental evaluation demonstrates the benefits of our approach.
Keywords
convergence; data mining; expectation-maximisation algorithm; parallel algorithms; pattern clustering; E-step; M-step; asynchronous model updates; communication cost; computing power; data explosion; data mining; expectation maximization; fast convergence; parallel EM clustering; parallel variant;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location
Sydney, NSW
Print_ISBN
978-1-4244-9244-2
Electronic_ISBN
978-0-7695-4257-7
Type
conf
DOI
10.1109/ICDMW.2010.53
Filename
5693298
Link To Document