DocumentCode
3237643
Title
GMM-Based Classification of Genomic Sequences
Author
Akhtar, Mahmood ; Ambikairajah, Eliathamby ; Epps, Julien
Author_Institution
Univ. of New South Wales, Sydney
fYear
2007
fDate
1-4 July 2007
Firstpage
103
Lastpage
106
Abstract
At present many digital signal processing based techniques are available to predict genomic protein coding regions. However, accurate identification of these regions at the level of individual nucleotides remains a challenge. In this paper, we propose the novel use of a multi-dimensional feature and Gaussian mixture models for the classification between protein coding and non-coding nucleotides. Employing signal processing based time-domain and frequency-domain features, the novel system described herein is shown to produce identification accuracies of more than 75% and 79% respectively for protein coding and non-coding nucleotides, when evaluated on the GENSCAN data set.
Keywords
Gaussian processes; cellular biophysics; feature extraction; genetics; medical computing; molecular biophysics; proteins; signal processing; time-frequency analysis; GENSCAN data set; GMM; Gaussian mixture models; classification; digital signal processing; frequency-domain features; genomic protein coding regions; genomic sequences; multidimensional feature; noncoding nucleotides; time-domain features; Accuracy; Bioinformatics; DNA; Digital signal processing; Discrete Fourier transforms; Genomics; Multidimensional signal processing; Proteins; Sequences; Signal processing algorithms; Gaussian mixture models; Genomic signal processing; digital filters; discrete Fourier transforms; discrete cosine transforms;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Signal Processing, 2007 15th International Conference on
Conference_Location
Cardiff
Print_ISBN
1-4244-0882-2
Electronic_ISBN
1-4244-0882-2
Type
conf
DOI
10.1109/ICDSP.2007.4288529
Filename
4288529
Link To Document