DocumentCode
8907
Title
An Integrated Framework for Functional Annotation of Protein Structural Domains
Author
Lei Deng ; Zhigang Chen
Author_Institution
Sch. of Software, Central South Univ., Changsha, China
Volume
12
Issue
4
fYear
2015
fDate
July-Aug. 1 2015
Firstpage
902
Lastpage
913
Abstract
Structural domains are evolutionary and functional units of proteins and play a critical role in comparative and functional genomics. Computational assignment of domain function with high reliability is essential for understanding whole-protein functions. However, functional annotations are conventionally assigned onto full-length proteins rather than associating specific functions to the individual structural domains. In this article, we present Structural Domain Annotation (SDA), a novel computational approach to predict functions for SCOP structural domains. The SDA method integrates heterogeneous information sources, including structure alignment based protein-SCOP mapping features, InterPro2GO mapping information, PSSM Profiles, and sequence neighborhood features, with a Bayesian network. By large-scale annotating Gene Ontology terms to SCOP domains with SDA, we obtained a database of SCOP domain to Gene Ontology mappings, which contains 162,000 out of the approximately 166,900 domains in SCOPe 2.03 (>97 percent) and their predicted Gene Ontology functions. We have benchmarked SDA using a single-domain protein dataset and an independent dataset from different species. Comparative studies show that SDA significantly outperforms the existing function prediction methods for structural domains in terms of coverage and maximum F-measure.
Keywords
Bayes methods; biology computing; evolution (biological); genomics; molecular biophysics; molecular configurations; ontologies (artificial intelligence); proteins; Bayesian network; InterPro2GO mapping information; PSSM Profiles; SCOP structural domains; SCOPe 2.03; SDA method; benchmarked SDA; comparative genomics; computational assignment; evolutionary units; full-length proteins; functional annotation; functional genomics; functional units; heterogeneous information sources; individual structural domains; integrated framework; large-scale annotating gene ontology terms; maximum F-measure; protein structural domains; sequence neighborhood features; single-domain protein dataset; structural domain annotation; structure alignment based protein-SCOP mapping; whole-protein functions; Bayes methods; Bioinformatics; Databases; Ontologies; Proteins; Support vector machines; Bayesian network; PSSM; Scop domain function; structure alignment;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2015.2389213
Filename
7004798
Link To Document