Title of article :
The performances of the chi-square test and complexity measures for signal recognition in biological sequences
Author/Authors :
Pirhaji، نويسنده , , Leila and Kargar، نويسنده , , Mehdi and Sheari، نويسنده , , Armita and Poormohammadi، نويسنده , , Hadi and Sadeghi، نويسنده , , Mehdi and Pezeshk، نويسنده , , Hamid and Eslahchi، نويسنده , , Changiz، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2008
Pages :
8
From page :
380
To page :
387
Abstract :
With large amounts of experimental data, modern molecular biology needs appropriate methods to deal with biological sequences. In this work, we apply a statistical method (Pearsonʹs chi-square test) to recognize the signals appear in the whole genome of the Escherichia coli. To show the effectiveness of the method, we compare the Pearsonʹs chi-square test with linguistic complexity on the complete genome of E. coli. The results suggest that Pearsonʹs chi-square test is an efficient method for distinguishing genes (coding regions) form pseudogenes (noncoding regions). On the other hand, the performance of the linguistic complexity is much lower than the chi-square test method. We also use the Pearsonʹs chi-square test method to determine which parts of the Open Reading Frame (ORF) have significant effect on discriminating genes form pseudogenes. Moreover, different complexity measures and Pearsonʹs chi-square test applied on the genes with high value of Pearsonʹs chi-square statistic. We also compute the measures on homologous of these genes. The results illustrate that there is a region near the start codon with high value of chi-square statistic and low complexity that is conserve between homologous genes.
Keywords :
Low complexity zone , Linguistic complexity , Open reading frame
Journal title :
Journal of Theoretical Biology
Serial Year :
2008
Journal title :
Journal of Theoretical Biology
Record number :
1539181
Link To Document :
بازگشت