• DocumentCode
    599147
  • Title

    A novel quasi-alignment-based method for discovering conserved regions in genetic sequences

  • Author

    Nagar, Atulya ; Hahsler, M.

  • Author_Institution
    Comput. Sci. & Eng, Southern Methodist Univ., Dallas, TX, USA
  • fYear
    2012
  • fDate
    4-7 Oct. 2012
  • Firstpage
    662
  • Lastpage
    669
  • Abstract
    This paper presents an alignment-free technique to efficiently discover similar regions in large sets of biological sequences using position sensitive p-mer frequency clustering. A set of sequences is broken down into segment and then a frequency distribution over all oligomers of size p (referred to as p-mers) is obtained to summarize each segment. These summaries are clustered while the order of segments in the set of sequences is preserved in a Markov-type model. Sequence segments within each cluster have very similar DNA/RNA patterns and form a so called quasi-alignment. This fact can be used for a variety of tasks such as species characterization and identification, phylogenetic analysis, functional analysis of sequences and, as in this paper, for discovering conserved regions. Our method is computationally more efficient than multiple sequences alignment since it can apply modern data stream clustering algorithms which run in time linear in the number of segments and thus can help discover highly similar regions across a large number of sequences efficiently. In this paper, we apply the approach to efficiently discover and visualize conserved regions in 16S rRNA.
  • Keywords
    DNA; RNA; biology computing; evolution (biological); genetics; molecular biophysics; molecular configurations; 16S rRNA; DNA-RNA patterns; Markov-type model; alignment-free technique; biological sequences; functional analysis; genetic sequences; multiple sequences alignment; phylogenetic analysis; position sensitive p-mer frequency clustering; quasialignment-based method; sequence segments; stream clustering algorithms; Bioinformatics; Buildings; Databases; Genomics; Numerical models; Phylogeny; Visualization; ONA/RNA sequences; conserved sequences; multiple sequence alignment; quasi-alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4673-2746-6
  • Electronic_ISBN
    978-1-4673-2744-2
  • Type

    conf

  • DOI
    10.1109/BIBMW.2012.6470216
  • Filename
    6470216