Title :
Expectation maximization algorithm made fast for large scale data
Author :
Nishchal K. Verma;Satyam Dwivedi;Rahul K. Sevakula
Author_Institution :
Department of Electrical Engineering, Indian Institute of Technology Kanpur, India
Abstract :
Expectation Maximization (EM) Algorithm is an iterative algorithm that is often used for estimating parameters of Gaussian Mixture Model. Due to high computational cost, it is generally not convenient for use in large scale datasets. This paper proposes three strategies for improving the time efficiency of the algorithm when applied on large datasets. The first strategy uses grid based sampling methods to first reduce the dataset size by approximately ten times and then uses a novel parameter initialization method to initialize parameters that are close to their final values. The second strategy is an adaptive approach that uses past errors to make the algorithm converge faster i.e. by taking lesser number of iterations. The third strategy introduces a way of performing parallel computations in EM algorithm that is different from earlier approaches, such that computation is distributed amongst parallel threads and simultaneously have minimum communication load. Extensive experimentation on 2 dimensional synthetic datasets that were generated with multiple Gaussian distribution components, have revealed that the proposed strategies are able to reduce the processing time by upto ten times, with first and third strategies performing especially well.
Keywords :
"Signal processing algorithms","Clustering algorithms","Parallel processing","Convergence","Covariance matrices","Algorithm design and analysis","Gaussian mixture model"
Conference_Titel :
Computational Intelligence: Theories, Applications and Future Directions (WCI), 2015 IEEE Workshop on
DOI :
10.1109/WCI.2015.7495515