Title :
Model based clustering for tandem mass spectrum quality assessment
Author :
Ding, Jiarui ; Shi, Jinhong ; Wu, Fang-Xiang
Author_Institution :
Dept. of Mech. Eng., Univ. of Saskatchewan, Saskatoon, SK, Canada
Abstract :
Several computational methods have been proposed to assess the quality of tandem mass spectra. These methods range from supervised to unsupervised algorithms, discriminative to generative models. Unsupervised learning algorithms for tandem mass spectra are not probabilistic model based and they don´t provide probabilities for spectra quality assessment. In this study, the distribution of high quality spectra and poor quality spectra are modeled by a mixture of Gaussian distributions. The Expectation Maximization (EM) algorithm is used to estimate the parameters of the Gaussian mixture model. A spectrum is assigned to the high quality or poor quality cluster according to its posterior probability. Experiments are conducted on two datasets: ISB and TOV. The results show about 57.64% and 66.38% of poor quality spectra can be removed without losing more than 10% of high quality spectra for the two spectral datasets, respectively. This indicates clustering as an exploratory data analysis tool is valuable for the quality assessment of tandem mass spectra without using a pre-labeled training dataset.
Keywords :
Gaussian distribution; biology computing; data analysis; expectation-maximisation algorithm; learning (artificial intelligence); mass spectroscopic chemical analysis; molecular biophysics; parameter estimation; pattern clustering; proteins; spectroscopy computing; Gaussian distribution; Gaussian mixture model; computational methods; expectation maximization algorithm; exploratory data analysis; model based clustering; parameters estimation; peptide identification pipeline; posterior probability; prelabeled training dataset; spectral datasets; supervised algorithms; tandem mass spectrum quality assessment; unsupervised algorithms; Algorithms; Cluster Analysis; Databases, Protein; Humans; Models, Biological; Quality Control; Tandem Mass Spectrometry;
Conference_Titel :
Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-3296-7
Electronic_ISBN :
1557-170X
DOI :
10.1109/IEMBS.2009.5332499