• DocumentCode
    1734163
  • Title

    Variable-Length Protein Sequence Motif Extraction Using Hierarchically-Clustered Hidden Markov Models

  • Author

    Hudson, Cody ; Chen, Bing

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Central Arkansas, Conway, AR, USA
  • Volume
    1
  • fYear
    2013
  • Firstpage
    173
  • Lastpage
    178
  • Abstract
    Primary sequence motif extraction from protein amino sequences is a field of growing importance in bioinformatics due to its relevance to both sequential and structural analysis. Many approaches for motif extraction include two limitations: a reliance on discovering an existing, known protein homologue to perform motif extraction or structural analysis, and an assumed motif length. This work would propose the Hierarchically Clustered-Hidden Markov Model approach, which represents the behavior and structure of proteins in terms of a Hidden Markov Model chain and hierarchically clusters each chain by minimizing distance between two given chain´s structure and behavior. It is well known that HMM can be utilized for clustering purpose, however, methods for clustering on Hidden Markov Models themselves are rarely studied. In this paper, we proposed a hierarchical clustering based algorithm for HMMs to discover protein sequence motifs that transcend family boundaries with no assumption on the length of the motif. This paper carefully examines the effectiveness of this approach for motif extraction on 2, 593 proteins that share no more than 25% sequence identity. Many interesting motifs are generated. Three example motifs generated by the HC-HMM approach are analyzed and visualized with their tertiary structure.
  • Keywords
    bioinformatics; hidden Markov models; proteins; bioinformatics; distance minimization; hierarchically-clustered hidden Markov models; primary sequence motif extraction; protein amino sequences; protein homologue; protein sequence motifs discovery; sequential analysis; structural analysis; tertiary structure; variable-length protein sequence motif extraction; Amino acids; Data mining; Data models; Databases; Hidden Markov models; Mathematical model; Proteins; Bioinformatics; Hidden Markov Model; Hierarchical Clustering; Sequential Motif;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2013 12th International Conference on
  • Conference_Location
    Miami, FL
  • Type

    conf

  • DOI
    10.1109/ICMLA.2013.37
  • Filename
    6784607