Author/Authors :
Arquès، نويسنده , , Didier G. and Fallot، نويسنده , , Jean-Paul and Michel، نويسنده , , Christian J.، نويسنده ,
Abstract :
The subsetX0={AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG,GAA,GAC,GAG,GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC} of 20 trinucleotides has a preferential occurrence in frame 0 (a reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subsetX0has the rarity property (6×10−8) to be a complementary maximal circular code with two permutated maximal circular codesX1andX2in frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5′-3′ direction).X0is called a C3code.
titative study of these three subsetsX0,X1andX2in the three frames 0, 1 and 2 of eukaryotic protein genes shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies ofX0,X1andX2in frame 0 of the eukaryotic protein genes are 48.5%, 29% and 22.5% respectively. These properties are not observed in the 5′ and 3′ regions of eukaryotes whereX0,X1andX2occur with variable frequencies around the random value (1/3).
l frequency asymmetries unexpectedly observed, e.g. the frequency difference betweenX1andX2in the frame 0, are related to a new property of the C3codeX0involving substitution. An evolutionary model at three parameters (p, q, k) based on an independent mixing of the 20 codons (trinucleotides in frame 0) ofX0with equiprobability (1/20) followed byk≈5 substitutions per codon in the three codon sites in proportionsp≈0.1,q≈0.1 andr=1−p−q≈0.8 respectively, retrieves the frequencies ofX0,X1andX2observed in the three frames of protein genes and explains these asymmetries.