• DocumentCode
    1442613
  • Title

    Prediction of Protein Functions with Gene Ontology and Interspecies Protein Homology Data

  • Author

    Mitrofanova, Antonina ; Pavlovic, Vladimir ; Mishra, Bud

  • Author_Institution
    Dept. of Comput. Sci., New York Univ., New York, NY, USA
  • Volume
    8
  • Issue
    3
  • fYear
    2011
  • Firstpage
    775
  • Lastpage
    784
  • Abstract
    Accurate computational prediction of protein functions increasingly relies on network-inspired models for the protein function transfer. This task can become challenging for proteins isolated in their own network or those with poor or uncharacterized neighborhoods. Here, we present a novel probabilistic chain-graph-based approach for predicting protein functions that builds on connecting networks of two (or more) different species by links of high interspecies sequence homology. In this way, proteins are able to “exchange” functional information with their neighbors-homologs from a different species. The knowledge of interspecies relationships, such as the sequence homology, can become crucial in cases of limited information from other sources of data, including the protein-protein interactions or cellular locations of proteins. We further enhance our model to account for the Gene Ontology dependencies by linking multiple but related functional ontology categories within and across multiple species. The resulting networks are of significantly higher complexity than most traditional protein network models. We comprehensively benchmark our method by applying it to two largest protein networks, the Yeast and the Fly. The joint Fly-Yeast network provides substantial improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation. At the same time, the new model retains the computational efficiency similar to that of the simpler networks.
  • Keywords
    bioinformatics; data analysis; genetics; genomics; macromolecules; molecular biophysics; proteins; gene ontology; probabilistic chain-graph-based method; protein functions; protein homology data; protein sequence homology; protein-protein interaction; Bioinformatics; Biological system modeling; Computational modeling; Computer networks; Computer science; Joining processes; Ontologies; Predictive models; Proteins; Sequences; Biology and genetics; bioinformatics (genome or protein) databases.; machine learning; Animals; Artificial Intelligence; Computational Biology; Hymenoptera; Models, Genetic; Models, Statistical; Protein Interaction Domains and Motifs; Proteins; Sequence Homology, Amino Acid; Species Specificity; Statistics, Nonparametric; Vocabulary, Controlled; Yeasts;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2010.15
  • Filename
    5432154