Title :
DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles
Author :
Yoo, Paul D. ; Sikder, Abdur R. ; Taheri, Javid ; Zhou, Bing Bing ; Zomaya, Albert Y.
Author_Institution :
Adv. Networks Res. Group, Univ. of Sydney, Sydney, NSW
fDate :
6/1/2008 12:00:00 AM
Abstract :
The accurate and stable prediction of protein domain boundaries is an important avenue for the prediction of protein structure, function, evolution, and design. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques. In this paper, we propose a new machine learning based domain predictor namely, DomNet that can show a more accurate and stable predictive performance than the existing state-of-the-art models. The DomNet is trained using a novel compact domain profile, secondary structure, solvent accessibility information, and interdomain linker index to detect possible domain boundaries for a target sequence. The performance of the proposed model was compared to nine different machine learning models on the Benchmark_2 dataset in terms of accuracy, sensitivity, specificity, and correlation coefficient. The DomNet achieved the best performance with 71% accuracy for domain boundary identification in multidomains proteins. With the CASP7 benchmark dataset, it again demonstrated superior performance to contemporary domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut, and DomainDiscovery.
Keywords :
biochemistry; biology computing; learning (artificial intelligence); molecular biophysics; molecular configurations; neural nets; proteins; CASP7 benchmark dataset; DomNet; correlation coefficient; enhanced general regression neural network; interdomain linker index; machine learning techniques; protein design; protein domain boundary prediction; protein evolution; protein function; protein structure; secondary structure; solvent accessibility information; target sequence; Australia; Encoding; Evolution (biology); Helium; Information analysis; Machine learning; Predictive models; Proteins; Sequences; Solvents; Domain boundary prediction; domain linker index; machine learning; sequence encoding; sequence profile; Amino Acid Sequence; Computer Simulation; Models, Chemical; Models, Molecular; Molecular Sequence Data; Protein Structure, Tertiary; Proteins; Regression Analysis; Sequence Analysis, Protein;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2008.2000747