• DocumentCode
    44913
  • Title

    Improved Multiple Sequence Alignments Using Coupled Pattern Mining

  • Author

    Hossain, K. S. M. Tozammel ; Patnaik, Debprakash ; Laxman, Srivatsan ; Jain, Paril ; Bailey-Kellogg, Chris ; Ramakrishnan, N.

  • Author_Institution
    Dept. of Comput. Sci., Virginia Tech, Blacskburg, VA, USA
  • Volume
    10
  • Issue
    5
  • fYear
    2013
  • fDate
    Sept.-Oct. 2013
  • Firstpage
    1098
  • Lastpage
    1112
  • Abstract
    We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.
  • Keywords
    DNA; bioinformatics; genetics; molecular biophysics; molecular configurations; proteins; biological functions; classical MSA algorithms; classical bioinformatic problem; conservation-based alignments; coupled pattern mining; gene sequences; hand-tweaking; max-flow approach; mining coupled residues; multiple biological sequences; multiple sequence alignment refinement; multiple sequence alignments; pattern mining approach; protein sequences; Amino acids; Bioinformatics; Classification algorithms; Hidden Markov models; Proteins; Sequential analysis; Multiple sequence alignment; coupled patterns; coupled residues; max-flow problems; pattern set mining;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.36
  • Filename
    6512489