DocumentCode :
684744
Title :
A new method for clustering mult-domain protein sequences
Author :
Hongzhou He ; Mingtian Zhou
Author_Institution :
Coll. of Math. & Comput. Sci., Mianyang Normal Univ., Mianyang, China
fYear :
2012
fDate :
7-9 Dec. 2012
Firstpage :
1
Lastpage :
5
Abstract :
A new method for clustering multi-domain protein sequences was proposed by revising preference value of classical affinity propagation (AP) algorithm combined by Silhouette index of clustering validity. At the same time, the classical substitution match similarity (SMS) between two protein sequences was generalized to meet the demand of clustering `twilight zone´ protein sequences. Experimental results on four test datasets demonstrate that our method can acquire number of clusters more approximate to the family number of clusters classified by the phylogenetic trees, more consistence clustering structure for a given dataset of proteins, and the comparatively advantage in clustering multi-domain protein sequences.
Keywords :
biology computing; genomics; pattern clustering; proteins; trees (mathematics); affinity propagation algorithm; clustering validity; multidomain protein sequences clustering; phylogenetic trees; silhouette index; substitution match similarity; clustering; protein sequences; revised affinity propagation (RAP); similarity measure;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Information Science and Control Engineering 2012 (ICISCE 2012), IET International Conference on
Conference_Location :
Shenzhen
Electronic_ISBN :
978-1-84919-641-3
Type :
conf
DOI :
10.1049/cp.2012.2330
Filename :
6755709
Link To Document :
بازگشت