Title :
Normalized EM algorithm for tumor clustering using gene expression data
Author :
Phuong, Nguyen Minh ; Vinh, Nguyen Xuan
Author_Institution :
Sch. of Electr. Eng. & Telecommun., Univ. of New South Wales, Kensington, NSW
Abstract :
Most of the proposed clustering approaches are heuristic in nature. As a result, it is difficult to interpret the obtained clustering outcomes from a statistical standpoint. Mixture model-based clustering has received much attention from the gene expression community due to its sound statistical background and its flexibility in data modeling. However, current clustering algorithms following the model-based framework suffer from two serious drawbacks. First, the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. And second, they are not capable of working directly with very high dimensional data sets whose dimension might be up to thousands. We propose a novel normalized Expectation-Maximization (EM) algorithm to tackle the two challenges. The normalized EM is stable even with random initializations for its EM iterative procedure. Its stability is demonstrated through the performance comparison with other related clustering algorithms such as the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering) and spherical k-means. Furthermore, the normalized EM is the first mixture model-based clustering algorithm that is shown to be stable when working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. Besides, an interesting property of the convergence speed of the normalized EM with respect to the squared radius of the hypersphere in its corresponding statistical model is uncovered.
Keywords :
expectation-maximisation algorithm; genetics; medical computing; molecular biophysics; pattern clustering; statistical analysis; tumours; Gaussian mixture model-based clustering; gene expression; normalized expectation-maximization algorithm; spherical k-means; tumor clustering; very high dimensional microarray data; Bioinformatics; Biological processes; Biological system modeling; Clustering algorithms; Gaussian distribution; Gene expression; Genomics; Iterative algorithms; Neoplasms; Stability;
Conference_Titel :
BioInformatics and BioEngineering, 2008. BIBE 2008. 8th IEEE International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4244-2844-1
Electronic_ISBN :
978-1-4244-2845-8
DOI :
10.1109/BIBE.2008.4696683