• DocumentCode
    2378166
  • Title

    On-line hierarchy of general linear models for selecting and ranking the best predicted protein structures

  • Author

    Girgis, Hani Zakaria ; Corso, Jason J. ; Fischer, Daniel

  • Author_Institution
    Comput. Sci. Dept., Johns Hopkins Univ., Baltimore, MD, USA
  • fYear
    2009
  • fDate
    3-6 Sept. 2009
  • Firstpage
    4949
  • Lastpage
    4953
  • Abstract
    To predict the three dimensional structure of proteins, many computational methods sample the conformational space, generating a large number of candidate structures. Subsequently, such methods rank the generated structures using a variety of model quality assessment programs in order to obtain a small set of structures that are most likely to resemble the unknown experimentally determined structure. Model quality assessment programs suffer from two main limitations: (i) the rank-one structure is not always the best predicted structure; in other words, the best predicted structure could be ranked as the 10th structure (ii) no single assessment method can correctly rank the predicted structures for all target proteins. However, because often at least some of the methods achieve a good ranking, a model quality assessment method that is based on a consensus of a number of model quality assessment methods is likely to perform better. We have devised the STPdata algorithm, a consensus method based on five model quality assessment programs. We have applied it to build an on-line ldquocustom-trainedrdquo hierarchy of general linear models to select and rank the best predicted structures. By ldquocustom-trainedrdquo, we mean for each target protein the STPdata algorithm trains a unique model on data related to the input target protein. To evaluate our method we participated in CASP8 as human predictors. In CASP8, the STPdata algorithm has trained 128 hierarchical models for each of the 128 target proteins. Based on the official results of CASP8 our method outperformed the best server by 6% and won the fourth position among human predictors. Our CASP results are purely based on computational methods without any human intervention.
  • Keywords
    biology computing; molecular biophysics; proteins; STPdata algorithm; best predicted protein structures; consensus method; general linear models; model quality assessment method; online custom-trained hierarchy; Algorithms; Animals; Computational Biology; Humans; Linear Models; Models, Theoretical; Protein Conformation; Proteins;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE
  • Conference_Location
    Minneapolis, MN
  • ISSN
    1557-170X
  • Print_ISBN
    978-1-4244-3296-7
  • Electronic_ISBN
    1557-170X
  • Type

    conf

  • DOI
    10.1109/IEMBS.2009.5332706
  • Filename
    5332706