DocumentCode
1240103
Title
DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles
Author
Yoo, Paul D. ; Sikder, Abdur R. ; Taheri, Javid ; Zhou, Bing Bing ; Zomaya, Albert Y.
Author_Institution
Adv. Networks Res. Group, Univ. of Sydney, Sydney, NSW
Volume
7
Issue
2
fYear
2008
fDate
6/1/2008 12:00:00 AM
Firstpage
172
Lastpage
181
Abstract
The accurate and stable prediction of protein domain boundaries is an important avenue for the prediction of protein structure, function, evolution, and design. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques. In this paper, we propose a new machine learning based domain predictor namely, DomNet that can show a more accurate and stable predictive performance than the existing state-of-the-art models. The DomNet is trained using a novel compact domain profile, secondary structure, solvent accessibility information, and interdomain linker index to detect possible domain boundaries for a target sequence. The performance of the proposed model was compared to nine different machine learning models on the Benchmark_2 dataset in terms of accuracy, sensitivity, specificity, and correlation coefficient. The DomNet achieved the best performance with 71% accuracy for domain boundary identification in multidomains proteins. With the CASP7 benchmark dataset, it again demonstrated superior performance to contemporary domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut, and DomainDiscovery.
Keywords
biochemistry; biology computing; learning (artificial intelligence); molecular biophysics; molecular configurations; neural nets; proteins; CASP7 benchmark dataset; DomNet; correlation coefficient; enhanced general regression neural network; interdomain linker index; machine learning techniques; protein design; protein domain boundary prediction; protein evolution; protein function; protein structure; secondary structure; solvent accessibility information; target sequence; Australia; Encoding; Evolution (biology); Helium; Information analysis; Machine learning; Predictive models; Proteins; Sequences; Solvents; Domain boundary prediction; domain linker index; machine learning; sequence encoding; sequence profile; Amino Acid Sequence; Computer Simulation; Models, Chemical; Models, Molecular; Molecular Sequence Data; Protein Structure, Tertiary; Proteins; Regression Analysis; Sequence Analysis, Protein;
fLanguage
English
Journal_Title
NanoBioscience, IEEE Transactions on
Publisher
ieee
ISSN
1536-1241
Type
jour
DOI
10.1109/TNB.2008.2000747
Filename
4538003
Link To Document