• DocumentCode
    1240103
  • Title

    DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles

  • Author

    Yoo, Paul D. ; Sikder, Abdur R. ; Taheri, Javid ; Zhou, Bing Bing ; Zomaya, Albert Y.

  • Author_Institution
    Adv. Networks Res. Group, Univ. of Sydney, Sydney, NSW
  • Volume
    7
  • Issue
    2
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    172
  • Lastpage
    181
  • Abstract
    The accurate and stable prediction of protein domain boundaries is an important avenue for the prediction of protein structure, function, evolution, and design. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques. In this paper, we propose a new machine learning based domain predictor namely, DomNet that can show a more accurate and stable predictive performance than the existing state-of-the-art models. The DomNet is trained using a novel compact domain profile, secondary structure, solvent accessibility information, and interdomain linker index to detect possible domain boundaries for a target sequence. The performance of the proposed model was compared to nine different machine learning models on the Benchmark_2 dataset in terms of accuracy, sensitivity, specificity, and correlation coefficient. The DomNet achieved the best performance with 71% accuracy for domain boundary identification in multidomains proteins. With the CASP7 benchmark dataset, it again demonstrated superior performance to contemporary domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut, and DomainDiscovery.
  • Keywords
    biochemistry; biology computing; learning (artificial intelligence); molecular biophysics; molecular configurations; neural nets; proteins; CASP7 benchmark dataset; DomNet; correlation coefficient; enhanced general regression neural network; interdomain linker index; machine learning techniques; protein design; protein domain boundary prediction; protein evolution; protein function; protein structure; secondary structure; solvent accessibility information; target sequence; Australia; Encoding; Evolution (biology); Helium; Information analysis; Machine learning; Predictive models; Proteins; Sequences; Solvents; Domain boundary prediction; domain linker index; machine learning; sequence encoding; sequence profile; Amino Acid Sequence; Computer Simulation; Models, Chemical; Models, Molecular; Molecular Sequence Data; Protein Structure, Tertiary; Proteins; Regression Analysis; Sequence Analysis, Protein;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2008.2000747
  • Filename
    4538003