DocumentCode :
2596120
Title :
A Time Series Approach for Identification of Exons and Introns
Author :
Gupta, Ravi ; Mittal, Ankush ; Singh, Kuldip ; Bajpai, Prateek ; Prakash, Suraj
Author_Institution :
Indian Inst. of Technol. Roorkee, Uttarakhand
fYear :
2007
fDate :
17-20 Dec. 2007
Firstpage :
91
Lastpage :
93
Abstract :
The classification of an organism gene sequence into coding and non-coding regions is a challenging task in DNA sequence analysis. The classification algorithms operate on the basic assumptions that every protein coding regions should have some distinct sequence features or properties that can distinguish it from the surrounding regions, such as non-coding regions and intergenic regions. In this study, we present a novel and generic approach for analysis of DNA sequences. A wavelet based time series approach is proposed for extracting statistical information from DNA sequences. The extracted information contains the variance information of amino/keto, purine/pyrimidine and weak/strong hydrogen bond distribution in a DNA sequence. The variance information is further used to construct a feature vector and a pattern recognition framework is applied for classifying exons and introns. An optimized support vector machine (SVM) classifier based on novel features is constructed for accurate classification of DNA sequences. Experiments were performed on exons and introns dataset of Homo sapiens and a 10-fold cross-validation accuracy of 87.5% was achieved. Further, test conducted were also conducted on unseen dataset of exons and introns of Homo sapiens and an accuracy of 88.95% was reported.
Keywords :
DNA; biology computing; feature extraction; genetics; molecular biophysics; optimisation; pattern classification; proteins; sequences; statistical analysis; support vector machines; time series; wavelet transforms; DNA sequence analysis; exons-introns identification; feature vector; optimized support vector machine classifier; organism gene sequence classification; pattern recognition; protein coding region; statistical information extraction; wavelet based time series approach; Bonding; Classification algorithms; DNA; Data mining; Hydrogen; Organisms; Proteins; Sequences; Support vector machine classification; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology, (ICIT 2007). 10th International Conference on
Conference_Location :
Orissa
Print_ISBN :
0-7695-3068-0
Type :
conf
DOI :
10.1109/ICIT.2007.54
Filename :
4418274
Link To Document :
بازگشت