Title :
Identification and analysis of coding and non-coding regions of a DNA sequence by positional frequency distribution of nucleotides (PFDN) algorithm
Author :
Roy, M. ; Biswas, S. ; Barman, S.
Author_Institution :
Women´´s Polytech., Gov. of West Bengal, Chandannagar, India
Abstract :
During the last several years, substantial progress has been made in developing high-throughput experimental techniques that produce large amounts of genomic data pertaining to molecular activities in cells. Consequently, a great deal of research is being focused on addressing important problems in molecular biology by analyzing these data using mathematical and computational approaches. Genomic signal processing has been an active area of research for the past two decades and have increasingly attracted the attention of researchers from digital signal processing area all over the world. An important step in genomic annotation is to identify protein coding regions of DNA sequence especially in the study of eukaryotic genomes. Due to lack of obvious sequence features among exons and introns, distinguishing protein coding regions from non-coding regions effectively is a challenging problem. A variety of computational algorithms have been developed to predict exons. Most of the exon finding algorithms are based on statistics methods. The signal processing approaches of recent years may identify some hidden periodicity and features which can not be revealed easily by conventional statistics methods. In this paper the authors have presented an algorithm to separate out coding regions from non-coding regions based on positional frequency distribution of nucleotides and the algorithm shows the results that exon regions exhibit more random behavior compared to intron regions. Such a behavior was also observed by FFT power spectrum analysis of DNA sequences. Case studies on genes from different organisms show that the algorithm is an effective approach towards exon prediction.
Keywords :
DNA; fast Fourier transforms; genomics; molecular biophysics; DNA sequence; FFT power spectrum analysis; PFDN algorithm; digital signal processing; eukaryotic genomes; genomic data; genomic signal processing; molecular activity; molecular biology; noncoding region; positional frequency distribution of nucleotides algorithm; Algorithm design and analysis; Bioinformatics; Biomedical signal processing; DNA; Digital signal processing; Frequency; Genomics; Proteins; Sequences; Signal processing algorithms; DNA; Discrete Fourier transform; Fast Fourier transform; Fourier transform; Genomic signal processing;
Conference_Titel :
Computers and Devices for Communication, 2009. CODEC 2009. 4th International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-5073-2