DocumentCode
1754994
Title
Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering
Author
Dong-Jun Yu ; Jun Hu ; Jing Yang ; Hong-Bin Shen ; Jinhui Tang ; Jing-Yu Yang
Author_Institution
Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
Volume
10
Issue
4
fYear
2013
fDate
July-Aug. 2013
Firstpage
994
Lastpage
1008
Abstract
Accurately identifying the protein-ligand binding sites or pockets is of significant importance for both protein function analysis and drug design. Although much progress has been made, challenges remain, especially when the 3D structures of target proteins are not available or no homology templates can be found in the library, where the template-based methods are hard to be applied. In this paper, we report a new ligand-specific template-free predictor called TargetS for targeting protein-ligand binding sites from primary sequences. TargetS first predicts the binding residues along the sequence with ligand-specific strategy and then further identifies the binding sites from the predicted binding residues through a recursive spatial clustering algorithm. Protein evolutionary information, predicted protein secondary structure, and ligand-specific binding propensities of residues are combined to construct discriminative features; an improved AdaBoost classifier ensemble scheme based on random undersampling is proposed to deal with the serious imbalance problem between positive (binding) and negative (nonbinding) samples. Experimental results demonstrate that TargetS achieves high performances and outperforms many existing predictors. TargetS web server and data sets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/TargetS/ for academic use.
Keywords
bioinformatics; bonds (chemical); learning (artificial intelligence); molecular biophysics; molecular configurations; proteins; sampling methods; sequences; TargetS predictor; accurate pocket identification; accurate protein-ligand binding site identification; binding sample-nonbinding sample imbalance problem; discriminative feature construction; drug design; homology template; improved AdaBoost classifier ensemble scheme; ligand-specific strategy; ligand-specific template-free predictor; positive sample-negative sample imbalance problem; primary sequence; protein evolutionary information; protein function analysis; protein secondary structure prediction; random undersampling; recursive spatial clustering algorithm; residue ligand-specific binding propensity; sequence binding residue prediction; target protein 3D structure; targeting protein-ligand binding site; template-based method application; template-free predictor design; Bioinformatics; Feature extraction; Metals; Protein sequence; Training; Protein-ligand binding sites; classifier ensemble; ligand-specific prediction model; spatial clustering; template-free;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2013.104
Filename
6583160
Link To Document