Title :
Prokaryote Gene Data Classifier Design Based on SVM
Author :
Li Xiao-xia ; Sun Bo ; Han Xue-mei ; Zhang Ji-hong
Author_Institution :
Sch. of Inf. Eng., Southwest Univ. of Sci. & Technol., Mianyang, China
Abstract :
Gene Recognition is one of the important problems in bioinformatics, including a lot of classic experiments, theory and arithmetic research. The E. coli K12 whole genome sequence and gene mark files from GeneBank were analyzed for later gene prediction. First the gene four distribution types were analyzed. Then the non-coding samples were generated from intervals between the discrete genes and the training set was constructed with all gene samples and nongene fragments. Thirdly the GC ratio and length features probability density of the training samples were plotted using Parzen window method. The average GC ratio of gene and non-coding samples are 0.51 and 0.45 separately. The average length of gene and non-coding samples are 954 and 164 nucleotides separately. At last Fisher linear classifier and Support vector machine (SVM) were used to classify the gene and nongene patterns. The results show that the least squares support vector machines error rate is 14.8%, which is 1.3% less than fisher classifier.
Keywords :
bioinformatics; genetics; least squares approximations; molecular biophysics; support vector machines; E. coli K12; Fisher linear classifier; GC ratio; GeneBank; Parzen window method; bioinformatics; gene mark files; gene prediction; gene recognition; least squares SVM; probability density; prokaryote gene data classifier design; support vector machine; whole genome sequence; Bioinformatics; Gene expression; Genomics; Hidden Markov models; Kernel; Least squares methods; Neural networks; Sequences; Support vector machine classification; Support vector machines;
Conference_Titel :
Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-2901-1
Electronic_ISBN :
978-1-4244-2902-8
DOI :
10.1109/ICBBE.2009.5163250