• DocumentCode
    382196
  • Title

    Visualization of K-tuple distribution in procaryote complete genomes and their randomized counterparts

  • Author

    Xie, Huimin ; Hao, Bailin

  • Author_Institution
    Dept. of Math., Suzhou Univ., China
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    31
  • Lastpage
    42
  • Abstract
    We (2000) previously developed a simple scheme to visualize the string composition of long DNA sequences in terms of two- and one-dimensional (2D and 1D) histograms. While the patterns in the 2D histograms have been well understood, the structure of the 1D histograms has not been analyzed in details. It turns out that the structure of the 1D histograms of the genomic sequences and their randomized counterparts varies significantly depending on the g+c content of the genomes. In particular the 1D histograms of some randomized sequences may show rich structure, a seemingly anti-intuitive result. Three approaches are used to explain the phenomenon: (1) Monte Carlo simulation, (2) exact computation by using the Goulden-Jackson cluster method, and (3) a Poisson approximation method. The multi-modal phenomena in K-histograms are well elucidated by the last approach.
  • Keywords
    DNA; Monte Carlo methods; Poisson distribution; approximation theory; biology computing; data visualisation; 1D histograms; 2D histograms; DNA sequences; Goulden-Jackson cluster method; Monte Carlo simulation; Poisson approximation; genomic sequences; randomized sequences; string composition; Bioinformatics; Computer displays; Counting circuits; DNA; Genomics; Histograms; Mathematics; Pattern analysis; Sequences; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
  • Print_ISBN
    0-7695-1653-X
  • Type

    conf

  • DOI
    10.1109/CSB.2002.1039327
  • Filename
    1039327