Title of article :
Interpreting correlations in biosequences
Author/Authors :
H Herzel، نويسنده , , E.N Trifonov، نويسنده , , O Weiss، نويسنده , , I Gro?e، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 1998
Abstract :
Understanding the complex organization of genomes as well as predicting the location of genes and the possible structure of the gene products are some of the most important problems in current molecular biology. Many statistical techniques are used to address these issues. A central role among them play correlation functions. This paper is based on an analysis of the decay of the entire 4×4 dimensional covariance matrix of DNA sequences. We apply this covariance analysis to human chromosomal regions, yeast DNA, and bacterial genomes and interpret the three most pronounced statistical features – long-range correlations, a period 3, and a period 10–11 – using known biological facts about the structure of genomes. For example, we relate the slowly decaying long-range G+C correlations to dispersed repeats and CpG islands. We show quantitatively that the 3-basepair-periodicity is due to the nonuniformity of the codon usage in protein coding segments. We finally show that periodicities of 10–11 basepairs in yeast DNA originate from an alternation of hydrophobic and hydrophilic amino acids in protein sequences.
Journal title :
Physica A Statistical Mechanics and its Applications
Journal title :
Physica A Statistical Mechanics and its Applications