Title :
Fast Algorithm for Clustering a Large Number of Protein Structural Decoys
Author :
Zhang, Jingfen ; Xu, Dong
Author_Institution :
Dept. of Comput. Sci., Univ. of Missouri, Columbia, MO, USA
Abstract :
Current protein structure prediction methods often generate a large number of structural candidates (decoys), and then select near-native decoys through clustering. Classical clustering methods for decoys are time consuming due to the pair-wise distance calculation between decoys. In this study, we developed a novel method for very fast decoy clustering. Instead of the commonly used pair-wise RMSD (pRMSD) values, we propose a new distance measure C-score based on contact maps of decoys. The analysis indicates that C-score and pRMSD are highly correlated and the clusters obtained from pRMSD and C-score are highly similar. Our C-score based clustering achieves a calculation time linearly proportional to the number of decoys while obtaining almost the same accuracy for near-native model selection in comparison to existing methods such as SPICKER and Calibur with calculation time quadratic to the number of decoys. Our method has been implemented in a package named MUFOLD-CL, available at http://mufold.org/clustering.php.
Keywords :
biology computing; molecular biophysics; pattern clustering; proteins; C-score distance measure; Calibur method; MUFOLD-CL package; SPICKER method; decoy contact map; near-native model selection; pair-wise RMSD; pairwise distance calculation; protein structural decoy clustering; protein structure prediction method; root mean square deviation; Clustering algorithms; Correlation; Equations; Fitting; Protein engineering; Proteins; Vectors; C-score; Contact Map Vector; Near-native detection; Protein decoy clustering;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1799-4
DOI :
10.1109/BIBM.2011.40