DocumentCode
2010184
Title
Machine Learning in Basecalling -- Decoding Trace Peak Behaviour
Author
Thornley, David ; Petridis, Stavros
Author_Institution
Dept. of Comput., Imperial Coll. London
fYear
2006
fDate
28-29 Sept. 2006
Firstpage
1
Lastpage
8
Abstract
DNA sequence basecalling is commonly regarded as a solved problem, despite significant error rates being reflected in inaccuracies in databases and genome annotations. These errors commonly arise from an inability to sequence through peak height variations in DNA sequencing traces from the Sanger sequencing method. Recent efforts toward improving basecalling accuracy have taken the form of more sophisticated digital filters and feature detectors. We demonstrate that the variation in peak heights itself encodes novel information which can be used for basecalling. To isolate this information for a clear demonstration, we perform a peculiar blind basecalling experiment using ABI processed output. Using classifiers responding to measurements in the context of the basecalling position, we call bases without reference to the peak heights at the basecalling position itself. Tree classifiers indicate which features are pertinent, and the application of neural nets to these features results in a startlingly high initial success rate of 78%. Our analysis indicates that we can make viable basecalls using information that has never been accessed before
Keywords
DNA; biology computing; learning (artificial intelligence); neural nets; pattern classification; trees (mathematics); DNA sequence basecalling; Sanger sequencing method; blind basecalling experiment; machine learning; neural nets; trace peak behaviour decoding; tree classifiers; Bioinformatics; Computer vision; DNA; Decoding; Digital filters; Error analysis; Genomics; Machine learning; Sequences; Spatial databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on
Conference_Location
Toronto, Ont.
Print_ISBN
1-4244-0623-4
Electronic_ISBN
1-4244-0624-2
Type
conf
DOI
10.1109/CIBCB.2006.330992
Filename
4133174
Link To Document