Abstract :
We continue our study, Poland [Biophysical Chemistry 110 (2004) 59–2], of the distribution of C or G (C–G for short) in the DNA of select organisms, in particular, the tendency for C–G to cluster on all scales with respect to the number of bases considered. We previously found that if we counted the number of C–G bases in consecutive, nonoverlapping boxes containing a total of m bases, then the width of the distribution function describing how many C–G bases are in a box increases with respect to m dramatically relative to the width expected for a random distribution. The relative width of the C–G composition distribution function was found to vary accurately as a power law with respect to m, the size of the box, over a very wide range of m values. We express the power law in terms of a characteristic exponent γ, that is, the relative widths of the distributions vary as mγ. The enhanced relative width of the distribution functions is a direct consequence of the tendency for boxes of similar composition to follow one another. This tendency represents persistence in composition from box to box and hence we refer to γ as the persistence exponent. The occurrence of a power law means that the tendency for C–G to cluster is present on all scales of sequence length (box size) up to the total length of the chromosome which for bacteria is the entire genome. The persistence exponent γ that characterizes the power law is thus an important parameter describing the distribution of C–G on all scales from individual base pairs up to the total length of the DNA sample considered. In the present paper, we determine the characteristic exponent γ and the associated fractal dimension of DNA samples for a selection of species representing all of the major types of organism, that is, we explore the phylogeny of the exponent γ. Here we treat six prokaryotes and six eukaryotes which, together with the species we have previously treated, brings the total number of species we have examined to 15. We find the power law form for the C–G distribution for all of the species treated and hence this behavior seems to be ubiquitous. The values of the characteristic exponent γ that we find tend to cluster around the value γ=0.20 with no obvious pattern with respect to phylogeny. The extreme values that we obtain are γ=0.057 (yeast) and γ=0.386 (human). We conclude by showing that the persistence of C–G clustering on the scale of the length of a chromosome is dramatically illustrated by interpreting the C–G distribution as a random walk.