DocumentCode :
2379404
Title :
In search of true reads: A classification approach to next generation sequencing data selection
Author :
Wijaya, Edward ; Pessiot, Jean-François ; Frith, Martin C. ; Fujibuchi, Wataru ; Asai, Kiyoshi ; Horton, Paul
fYear :
2010
fDate :
18-18 Dec. 2010
Firstpage :
561
Lastpage :
566
Abstract :
Next generation sequencing (NGS) technology has increasingly become the backbone of transcriptomics analysis, but sequencer error causes biases in the read counts. In this paper we establish a framework for predicting true sequences from NGS data. We formulate this task as a classification problem. We define several features, such as log likelihood ratio of estimated true counts, error probability and observed count of the reads. Using a Support Vector Machine (SVM) classifier, we show that on simulated reads these features can achieve 96.35% classification accuracy in discriminating true sequences. Using this framework we provide a way for users to select sequences with a desired precision and recall for their analysis. The feature generation software and the simulated data set can be obtained from (http://seq.cbrc.jp/NGSFeatGen).
Keywords :
DNA; bioinformatics; data analysis; error analysis; feature extraction; maximum likelihood estimation; molecular biophysics; pattern classification; support vector machines; DNA sequencing; SVM classifier; data classification; error probability; estimated true counts; feature extraction; feature generation software; log likelihood ratio; next generation sequencing technology; read counts; sequence prediction; support vector machine; transcriptomics analysis; Illumina; Solexa; classification; expectation maximization; next generation sequencing; transcriptomics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
Conference_Location :
Hong, Kong
Print_ISBN :
978-1-4244-8303-7
Electronic_ISBN :
978-1-4244-8304-4
Type :
conf
DOI :
10.1109/BIBMW.2010.5703862
Filename :
5703862
Link To Document :
بازگشت