Title :
An Integrated Framework for Functional Annotation of Protein Structural Domains
Author :
Lei Deng ; Zhigang Chen
Author_Institution :
Sch. of Software, Central South Univ., Changsha, China
Abstract :
Structural domains are evolutionary and functional units of proteins and play a critical role in comparative and functional genomics. Computational assignment of domain function with high reliability is essential for understanding whole-protein functions. However, functional annotations are conventionally assigned onto full-length proteins rather than associating specific functions to the individual structural domains. In this article, we present Structural Domain Annotation (SDA), a novel computational approach to predict functions for SCOP structural domains. The SDA method integrates heterogeneous information sources, including structure alignment based protein-SCOP mapping features, InterPro2GO mapping information, PSSM Profiles, and sequence neighborhood features, with a Bayesian network. By large-scale annotating Gene Ontology terms to SCOP domains with SDA, we obtained a database of SCOP domain to Gene Ontology mappings, which contains 162,000 out of the approximately 166,900 domains in SCOPe 2.03 (>97 percent) and their predicted Gene Ontology functions. We have benchmarked SDA using a single-domain protein dataset and an independent dataset from different species. Comparative studies show that SDA significantly outperforms the existing function prediction methods for structural domains in terms of coverage and maximum F-measure.
Keywords :
Bayes methods; biology computing; evolution (biological); genomics; molecular biophysics; molecular configurations; ontologies (artificial intelligence); proteins; Bayesian network; InterPro2GO mapping information; PSSM Profiles; SCOP structural domains; SCOPe 2.03; SDA method; benchmarked SDA; comparative genomics; computational assignment; evolutionary units; full-length proteins; functional annotation; functional genomics; functional units; heterogeneous information sources; individual structural domains; integrated framework; large-scale annotating gene ontology terms; maximum F-measure; protein structural domains; sequence neighborhood features; single-domain protein dataset; structural domain annotation; structure alignment based protein-SCOP mapping; whole-protein functions; Bayes methods; Bioinformatics; Databases; Ontologies; Proteins; Support vector machines; Bayesian network; PSSM; Scop domain function; structure alignment;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2015.2389213