DocumentCode :
8119
Title :
A Combinatorial Perspective of the Protein Inference Problem
Author :
Chao Yang ; Zengyou He ; Weichuan Yu
Author_Institution :
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
Volume :
10
Issue :
6
fYear :
2013
fDate :
Nov.-Dec. 2013
Firstpage :
1542
Lastpage :
1547
Abstract :
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
Keywords :
Java; bioinformatics; combinatorial mathematics; inference mechanisms; probability; proteins; proteomics; Java source code; ProteinInfer program; ProteinProphet program; combinatorial mathematics; degenerate peptides; protein identification; protein inference problem; protein mixtures; protein probability estimation; proteomics experiment; Bioinformatics; Peptides; Probability; Proteins; Upper bound; Protein identification; analytical formulation; combinatorial perspective; probability bounds;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.110
Filename :
6600680
Link To Document :
بازگشت