Title :
The Application of Support Vector Machine to Operon Prediction
Author :
Wang, Xiumei ; Du, Wei ; Wang, Yan ; Zhang, Chen ; Zhou, Chunguang ; Wang, Shuqin ; Liang, Yanchun
Author_Institution :
Key Lab. of Symbol Comput. & Knowledge Eng. of the Minist. of Educ., Jilin Univ., Changchun, China
Abstract :
In this paper, we apply the least-square support vector machine (LS-SVM) to operon prediction of Escherichia coli (E.coli), with different combinations of intergenic distance, gene expression data, and phylogenetic profile. Experimental results demonstrate that the WO pairs tend to have shorter intergenic distances, higher correlation coefficient and much stronger relation of co-envoled between phylogenetic profiles. Also, we dealt with the data sets extracted from WOs¿ and TUBs¿, processed the intergenic distances with log-energy entropy, de-noised the Pearson correlation coefficients of two genes expression data with wavelet transform, and computed the Hamming distances of two phylogenetic profiles. Then we trained LS-SVM using part of the data sets and tested the trained classifier model using the rest data sets. It shows that different combinations of features could affect the prediction results. When the combination of intergenic distance, gene expression data and phylogenetic profile is taken as the input of LS-SVM in the linear kernel type, good results can be obtained, of which the accuracy, sensitivity and specificity are 92.34%, 93.54%, and 90.73%, respectively.
Keywords :
bioinformatics; entropy; genetics; genomics; least squares approximations; microorganisms; support vector machines; wavelet transforms; Escherichia coli prediction; Hamming distance; LS-SVM; Pearson correlation coefficients; gene expression data; genes expression data; intergenic distances; least-square support vector machine; linear kernel type; log-energy entropy; operon prediction; phylogenetic profile; support vector machine; trained classifier model; wavelet transform; Data mining; Entropy; Gene expression; Genetic expression; Kernel; Phylogeny; Sensitivity and specificity; Support vector machines; Testing; Wavelet transforms;
Conference_Titel :
Future Generation Communication and Networking, 2008. FGCN '08. Second International Conference on
Conference_Location :
Hainan Island
Print_ISBN :
978-0-7695-3431-2
DOI :
10.1109/FGCN.2008.189