• DocumentCode
    419339
  • Title

    Deriving a novel codon index by combining period-3 and fractal features of DNA sequences

  • Author

    Qi, Yan ; Gao, Jianbo ; Cao, Yinhe ; Tung, Wen-wen

  • Author_Institution
    Dept. of Biomed. Eng., Johns Hopkins Univ., Baltimore, MD, USA
  • fYear
    2004
  • fDate
    16-19 Aug. 2004
  • Firstpage
    531
  • Lastpage
    532
  • Abstract
    Summary form only given. When a gene finding algorithm incorporates multiple useful and non-redundant sources of information about coding regions, it becomes more successful. It is thus highly desirable to find new and efficient codon indices. Here we propose a novel codon index, which we call the period-3 fractal deviation (PFD). This is obtained by simultaneously considering two incompatible features of DNA sequences, the period-3 feature in coding regions and the fractal feature in both coding and non-coding regions. These two features are incompatible because period-3 defines a specific scale of three nucleotide bases while fractal means there are not any specific scales. The PFD is very different for coding and non-coding sequences, and is reading-frame-dependent. The accuracy of the PFD is evaluated by studying all of the 16 yeast chromosomes. It is found that the percentage accuracy is very high and quite independent of the sliding window size. It is also found that this percentage accuracy is much higher than when period-3 and fractal features are characterized alone, especially when the window size is small. This highly suggests that the method is not only useful for the study of long genome sequences, but may also be very powerful for the study of short DNA segments. The PFD is complementary to other codon indices, including Fourier measures of period-3. This makes it possible to integrate PFD with other measures. Indeed, integration of the PFD measure with those indices using the Fisher linear discriminant analysis significantly improves the accuracy of protein coding sequence identification; This implies the measure proposed here may be readily incorporated into existing gene finding algorithms. Other salient features of the method is that it is non-parametric, does not require training, and can be fully automated.
  • Keywords
    DNA; biology computing; cellular biophysics; fractals; genetics; molecular biophysics; proteins; DNA sequences; Fisher linear discriminant analysis; Fourier measures; amino acid; coding sequences; codon index; fractal features; long genome sequences; noncoding sequences; nucleotide bases; period-3 fractal deviation; protein coding sequence identification; yeast chromosomes; Bioinformatics; Biological cells; DNA; Fractals; Fungi; Genomics; Information resources; Linear discriminant analysis; Phase frequency detector; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
  • Print_ISBN
    0-7695-2194-0
  • Type

    conf

  • DOI
    10.1109/CSB.2004.1332486
  • Filename
    1332486