Title :
A new method for clustering mult-domain protein sequences
Author :
Hongzhou He ; Mingtian Zhou
Author_Institution :
Coll. of Math. & Comput. Sci., Mianyang Normal Univ., Mianyang, China
Abstract :
A new method for clustering multi-domain protein sequences was proposed by revising preference value of classical affinity propagation (AP) algorithm combined by Silhouette index of clustering validity. At the same time, the classical substitution match similarity (SMS) between two protein sequences was generalized to meet the demand of clustering `twilight zone´ protein sequences. Experimental results on four test datasets demonstrate that our method can acquire number of clusters more approximate to the family number of clusters classified by the phylogenetic trees, more consistence clustering structure for a given dataset of proteins, and the comparatively advantage in clustering multi-domain protein sequences.
Keywords :
biology computing; genomics; pattern clustering; proteins; trees (mathematics); affinity propagation algorithm; clustering validity; multidomain protein sequences clustering; phylogenetic trees; silhouette index; substitution match similarity; clustering; protein sequences; revised affinity propagation (RAP); similarity measure;
Conference_Titel :
Information Science and Control Engineering 2012 (ICISCE 2012), IET International Conference on
Conference_Location :
Shenzhen
Electronic_ISBN :
978-1-84919-641-3
DOI :
10.1049/cp.2012.2330