DocumentCode :
1352783
Title :
Clustering 100,000 Protein Structure Decoys in Minutes
Author :
Li, Shuai Cheng ; Bu, Dongbo ; Li, Ming
Author_Institution :
Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon, China
Volume :
9
Issue :
3
fYear :
2012
Firstpage :
765
Lastpage :
773
Abstract :
Ab initio protein structure prediction methods first generate large sets of structural conformations as candidates (called decoys), and then select the most representative decoys through clustering techniques. Classical clustering methods are inefficient due to the pairwise distance calculation, and thus become infeasible when the number of decoys is large. In addition, the existing clustering approaches suffer from the arbitrariness in determining a distance threshold for proteins within a cluster: a small distance threshold leads to many small clusters, while a large distance threshold results in the merging of several independent clusters into one cluster. In this paper, we propose an efficient clustering method through fast estimating cluster centroids and efficient pruning rotation spaces. The number of clusters is automatically detected by information distance criteria. A package named ONION, which can be downloaded freely, is implemented accordingly. Experimental results on benchmark data sets suggest that ONION is 14 times faster than existing tools, and ONION obtains better selections for 31 targets, and worse selection for 19 targets compared to SPICKER´s selections. On an average PC, ONION can cluster 100,000 decoys in around 12 minutes.
Keywords :
ab initio calculations; biology computing; molecular biophysics; molecular configurations; proteins; software packages; ONION package; ab initio protein structure prediction method; benchmark data sets; classical clustering method; cluster centroids; information distance criteria; protein structure decoy clustering; pruning rotation space; structural conformations; Approximation methods; Bioinformatics; Clustering algorithms; Computational biology; Density functional theory; Proteins; Three dimensional displays; Protein structure; clustering.; decoy selection; Algorithms; Cluster Analysis; Protein Conformation; Protein Folding; Proteins; Software;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2011.142
Filename :
6051428
Link To Document :
بازگشت