DocumentCode
2349751
Title
An unsupervised protein sequences clustering algorithm using functional domain information
Author
Chen, Wei-Bang ; Zhang, Chengcui ; Zhong, Hua
Author_Institution
Department of Computer and Information Sciences, University of Alabama at Birmingham, 35294, USA
fYear
2008
fDate
13-15 July 2008
Firstpage
76
Lastpage
81
Abstract
In this paper, we present an unsupervised novel approach for protein sequences clustering by incorporating the functional domain information into the clustering process. In the proposed framework, the domain boundaries predicated by ProDom database are used to provide a better measurement in calculating the sequence similarity. In addition, we use an unsupervised clustering algorithm as the kernel that includes a hierarchical clustering in the first phase to pre-cluster the protein sequences, and a partitioning clustering in the second phase to refine the clustering results. More specifically, we perform the agglomerative hierarchical clustering on protein sequences in the first phase to obtain the initial clustering results for the subsequent partitioning clustering, and then, a profile Hidden Markove Model (HMM) is built for each cluster to represent the centroid of a cluster. In the second phase, the HMMs based k-means clustering is then performed to refine the cluster results as protein families. The experimental results show our model is effective and efficient in clustering protein families.
Keywords
Biomedical measurements; Clustering algorithms; Clustering methods; Data mining; Databases; Hidden Markov models; Kernel; Merging; Partitioning algorithms; Protein sequence; Data Mining and Knowledge Discovery; ProDom database; Profile Hidden Markov Model (HMM); Protein Sequences Clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV, USA
Print_ISBN
978-1-4244-2659-1
Electronic_ISBN
978-1-4244-2660-7
Type
conf
DOI
10.1109/IRI.2008.4583008
Filename
4583008
Link To Document