DocumentCode :
3580840
Title :
Identification of single nucleotide polymorphism using support vector machine on imbalanced data
Author :
Hasibuan, Lailan Sahrina ; Kusuma, Wisnu Ananta ; Suwamo, Willy Bayuardi
Author_Institution :
Dept. of Comput. Sci., Bogor Agric. Univ., Bogor, Indonesia
fYear :
2014
Firstpage :
375
Lastpage :
379
Abstract :
The advance of DNA sequencing technology presents a significant bioinformatic challenges in a downstream analysis such as identification of single nucleotide polymorphism (SNP). SNP is the most abundant form of genetic marker and have been one of the most crucial researches in bioinformatics. SNP has been applied in wide area, but analysis of SNP in plants is very limited, as in cultivated soybean (Glycine max L.). This paper discusses the identification of SNP in cultivated soybean using Support Vector Machine (SVM). SVM is trained using positive and negative SNP. Previously, we performed a balancing positive and negative SNP with undersampling and oversampling to obtain training data. As a result, the model which is trained with balanced data has better performance than that with imbalanced data.
Keywords :
bioinformatics; genetics; sampling methods; support vector machines; DNA sequencing technology; Glycine max L; SNP analysis; bioinformatics; cultivated soybean; downstream analysis; genetic marker; imbalanced data; negative SNP; oversampling; positive SNP; single nucleotide polymorphism identification; support vector machine; undersampling; Bioinformatics; DNA; Genomics; Sequential analysis; Support vector machines; Testing; Training data; SNP; SVM; identification; oversampling; undersampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computer Science and Information Systems (ICACSIS), 2014 International Conference on
Type :
conf
DOI :
10.1109/ICACSIS.2014.7065854
Filename :
7065854
Link To Document :
بازگشت