DocumentCode :
3237643
Title :
GMM-Based Classification of Genomic Sequences
Author :
Akhtar, Mahmood ; Ambikairajah, Eliathamby ; Epps, Julien
Author_Institution :
Univ. of New South Wales, Sydney
fYear :
2007
fDate :
1-4 July 2007
Firstpage :
103
Lastpage :
106
Abstract :
At present many digital signal processing based techniques are available to predict genomic protein coding regions. However, accurate identification of these regions at the level of individual nucleotides remains a challenge. In this paper, we propose the novel use of a multi-dimensional feature and Gaussian mixture models for the classification between protein coding and non-coding nucleotides. Employing signal processing based time-domain and frequency-domain features, the novel system described herein is shown to produce identification accuracies of more than 75% and 79% respectively for protein coding and non-coding nucleotides, when evaluated on the GENSCAN data set.
Keywords :
Gaussian processes; cellular biophysics; feature extraction; genetics; medical computing; molecular biophysics; proteins; signal processing; time-frequency analysis; GENSCAN data set; GMM; Gaussian mixture models; classification; digital signal processing; frequency-domain features; genomic protein coding regions; genomic sequences; multidimensional feature; noncoding nucleotides; time-domain features; Accuracy; Bioinformatics; DNA; Digital signal processing; Discrete Fourier transforms; Genomics; Multidimensional signal processing; Proteins; Sequences; Signal processing algorithms; Gaussian mixture models; Genomic signal processing; digital filters; discrete Fourier transforms; discrete cosine transforms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Signal Processing, 2007 15th International Conference on
Conference_Location :
Cardiff
Print_ISBN :
1-4244-0882-2
Electronic_ISBN :
1-4244-0882-2
Type :
conf
DOI :
10.1109/ICDSP.2007.4288529
Filename :
4288529
Link To Document :
بازگشت