Title :
Variable-Length Protein Sequence Motif Extraction Using Hierarchically-Clustered Hidden Markov Models
Author :
Hudson, Cody ; Chen, Bing
Author_Institution :
Dept. of Comput. Sci., Univ. of Central Arkansas, Conway, AR, USA
Abstract :
Primary sequence motif extraction from protein amino sequences is a field of growing importance in bioinformatics due to its relevance to both sequential and structural analysis. Many approaches for motif extraction include two limitations: a reliance on discovering an existing, known protein homologue to perform motif extraction or structural analysis, and an assumed motif length. This work would propose the Hierarchically Clustered-Hidden Markov Model approach, which represents the behavior and structure of proteins in terms of a Hidden Markov Model chain and hierarchically clusters each chain by minimizing distance between two given chain´s structure and behavior. It is well known that HMM can be utilized for clustering purpose, however, methods for clustering on Hidden Markov Models themselves are rarely studied. In this paper, we proposed a hierarchical clustering based algorithm for HMMs to discover protein sequence motifs that transcend family boundaries with no assumption on the length of the motif. This paper carefully examines the effectiveness of this approach for motif extraction on 2, 593 proteins that share no more than 25% sequence identity. Many interesting motifs are generated. Three example motifs generated by the HC-HMM approach are analyzed and visualized with their tertiary structure.
Keywords :
bioinformatics; hidden Markov models; proteins; bioinformatics; distance minimization; hierarchically-clustered hidden Markov models; primary sequence motif extraction; protein amino sequences; protein homologue; protein sequence motifs discovery; sequential analysis; structural analysis; tertiary structure; variable-length protein sequence motif extraction; Amino acids; Data mining; Data models; Databases; Hidden Markov models; Mathematical model; Proteins; Bioinformatics; Hidden Markov Model; Hierarchical Clustering; Sequential Motif;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
DOI :
10.1109/ICMLA.2013.37