• DocumentCode
    2764859
  • Title

    Synthesizing Aligned Random Pattern Digraphs from protein sequence patterns

  • Author

    Lee, Annie En-Shiun ; Wong, Andrew K.C.

  • Author_Institution
    Syst. Design Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2011
  • fDate
    12-15 Nov. 2011
  • Firstpage
    178
  • Lastpage
    185
  • Abstract
    An essential step of protein function analysis is to discover patterns that represent functional regions in a set of protein family sequences. However, the same functional region of a protein family that occurs in different sequences may contain variations that resulted from biological substitutions, deletions, and insertions. Thus, a sequence pattern representing this functional region seldom repeats precisely at the exact position with the same amino acid residues. To capture these variable associations, we developed a pattern synthesis process. First, we used an effective sequence pattern discovery algorithm to discover high order patterns as input. Next, we group and align these similar discovered patterns into Aligned Random Pattern Clusters (ARPCs). During the clustering process, each ARPC is transformed into a probabilistic structural pattern called the Aligned Random Pattern Digraph (ARPD). The advantages of our synthesis process are 1) the synthesized patterns are not confined to a fixed protein region since the ARPCs captures the similar patterns by their variable sites, 2) the ARPDs retain both horizontal pattern associations and vertical site variations, and 3) the search space for synthesizing input patterns is smaller than that for aligning input sequences. Our method successfully discovers two functional protein regions of the Cytochrome Complex protein family: the proximal and distal binding segment that binds the iron molecule of the heme ligand from each side of the plane without relying on prior knowledge.
  • Keywords
    bioinformatics; bonds (chemical); molecular biophysics; molecular configurations; proteins; aligned random pattern digraphs; amino acid residues; clustering process; cytochrome complex protein family; distal binding segment; heme ligand; horizontal pattern associations; iron molecule; probabilistic structural pattern; protein sequence patterns; proximal binding segment; sequence pattern discovery algorithm; vertical site variations; Hidden Markov models; Indexes; Iron; Probabilistic logic; Protein engineering; Proteins; Runtime; Aligned Random Pattern Digraph; Clustering; Pattern; Protein Function Prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4577-1612-6
  • Type

    conf

  • DOI
    10.1109/BIBMW.2011.6112372
  • Filename
    6112372