Title of article :
A Complementary Circular Code in the Protein Coding Genes
Author/Authors :
Arquès، نويسنده , , Didier G. and Michel، نويسنده , , Christian J.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 1996
Pages :
14
From page :
45
To page :
58
Abstract :
Recently, shifted periodicities 1 modulo 3 and 2 modulo 3 have been identified in protein (coding) genes of both prokaryotes and eukaryotes with autocorrelation functions analysing eight of 64 trinucleotides (Arquèset al., 1995). This observation suggests that the trinucleotides are associated with frames in protein genes. In order to verify this hypothesis, a distribution of the 64 trinucleotides AAA,...,TTT is studied in both gene populations by using a simple method based on the trinucleotide frequencies per frame. In protein genes, the trinucleotides can be read in three frames: the reading frame 0 established by the ATG start trinucleotide and frame 1 (resp. 2) which is the frame 0 shifted by 1 (resp. 2) nucleotide in the 5′–3′ direction. Then, the occurrence frequencies of the 64 trinucleotides are computed in the three frames. By classifying each of the 64 trinucleotides in its preferential occurrence frame, i.e. the frame associated with its highest frequency, three subsets of trinucleotides can be identified in the three frames. This approach is applied in the two gene populations. ctedly, the same three subsets of trinucleotides are identified in these two gene populations:T0=X0∪ {AAA,TTT} withX0= {AAC,AAT,ACC,ATC,ATT,CAG,CTC,CTG,GAA,GAC,GAG,GAT,GCC,GGC,GGT,GTA,GTC,GTT,TAC,TTC} in frame 0,T1=X1∪ {CCC} in frame 1 andT2=X2∪ {GGG} in frame 2, each subsetX0,X1andX2having 20 trinucleotides. Surprisingly, these three subsets have five important properties: (i) the property of maximal circular code forX0(resp.X1,X2) allowing the automatical retrieval of frame 0 (resp. 1, 2) in any region of a protein gene model (formed by a series of trinucleotides ofX0) without using a start codon; (ii) the DNA complementarity propertyC(e.g.C(AAC) = GTT):C(T0) =T0,C(T1) =T2andC(T2) =T1allowing the two paired reading frames of a DNA double helix simultaneously to code for amino acids; (iii) the circular permutation propertyP(e.g.P(AAC) = ACA);P(X0) = andP(X1) =X2implying that the two subsetsX1andX2can be deduced fromX0; (iv) the rarity property with an occurrence probability ofX0equal to 6 × 10−8; and (v) the concatenation property with: a high frequency (27.5%) of misplaced trinucleotides in the shifted frames, a maximum (13 nucleotides) length of the minimal window to automatically retrieve the frame and an occurrence of the four types of nucleotides in the three trinucleotides sites, in favour of an evolutionary code. Discussion, the identified subsetsT0,T1andT2replaced in the three two-letter genetic alphabets purine/pyrimidine, amino/ceto and strong/weak interaction, allow us to deduce that the RNY model (R = purine = A or G, Y = pyrimidine = C or T, N = R or Y) (Eigen & Schuster, 1978) is the closest two-letter codon model to the trinucleotides ofT0. Then, these three subsets are related to the genetic code. The trinucleotides ofT0code for 13 amino acids: Ala, Asn, Asp, Gln, Glu, Gly, Ile, Leu, Lys, Phe, Thr, Tyr, and Val. Finally, a strong correlation between the usage of the trinucleotides ofT0in protein genes and the amino acid frequencies in proteins is observed as six among seven amino acids not coded byT0have as expected the lowest frequencies in proteins of both prokaryotes and eukaryotes.
Journal title :
Journal of Theoretical Biology
Serial Year :
1996
Journal title :
Journal of Theoretical Biology
Record number :
1532969
Link To Document :
بازگشت