• DocumentCode
    8119
  • Title

    A Combinatorial Perspective of the Protein Inference Problem

  • Author

    Chao Yang ; Zengyou He ; Weichuan Yu

  • Author_Institution
    Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
  • Volume
    10
  • Issue
    6
  • fYear
    2013
  • fDate
    Nov.-Dec. 2013
  • Firstpage
    1542
  • Lastpage
    1547
  • Abstract
    In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
  • Keywords
    Java; bioinformatics; combinatorial mathematics; inference mechanisms; probability; proteins; proteomics; Java source code; ProteinInfer program; ProteinProphet program; combinatorial mathematics; degenerate peptides; protein identification; protein inference problem; protein mixtures; protein probability estimation; proteomics experiment; Bioinformatics; Peptides; Probability; Proteins; Upper bound; Protein identification; analytical formulation; combinatorial perspective; probability bounds;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.110
  • Filename
    6600680