Title :
Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics
Author :
Webb-Robertson, Bobbie-Jo M. ; Oehmen, Christopher S. ; Cannon, William R.
Author_Institution :
Pacific Northwest Nat. Lab., Richland
Abstract :
Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by ´matching´ an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.
Keywords :
biology computing; mass spectrometer accessories; probability; proteins; query processing; support vector machines; MSPolygraph program; constituent peptides; database search routines; genome information; global protein content; improved peptide identification; mass analysis; mass spectrometry-based proteomics; peptide features; peptide identification classifier; physico-chemical properties; predicted spectrum matching; probabilistic database scoring metrics; probability models; proteome characterization; scoring algorithm; shotgun proteomics; support vector machine classification; Bioinformatics; Classification tree analysis; Databases; Genomics; Mass spectroscopy; Peptides; Proteins; Proteomics; Support vector machine classification; Support vector machines;
Conference_Titel :
Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
Conference_Location :
Cincinnati, OH
Print_ISBN :
978-0-7695-3069-7
DOI :
10.1109/ICMLA.2007.17