• DocumentCode
    2010184
  • Title

    Machine Learning in Basecalling -- Decoding Trace Peak Behaviour

  • Author

    Thornley, David ; Petridis, Stavros

  • Author_Institution
    Dept. of Comput., Imperial Coll. London
  • fYear
    2006
  • fDate
    28-29 Sept. 2006
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    DNA sequence basecalling is commonly regarded as a solved problem, despite significant error rates being reflected in inaccuracies in databases and genome annotations. These errors commonly arise from an inability to sequence through peak height variations in DNA sequencing traces from the Sanger sequencing method. Recent efforts toward improving basecalling accuracy have taken the form of more sophisticated digital filters and feature detectors. We demonstrate that the variation in peak heights itself encodes novel information which can be used for basecalling. To isolate this information for a clear demonstration, we perform a peculiar blind basecalling experiment using ABI processed output. Using classifiers responding to measurements in the context of the basecalling position, we call bases without reference to the peak heights at the basecalling position itself. Tree classifiers indicate which features are pertinent, and the application of neural nets to these features results in a startlingly high initial success rate of 78%. Our analysis indicates that we can make viable basecalls using information that has never been accessed before
  • Keywords
    DNA; biology computing; learning (artificial intelligence); neural nets; pattern classification; trees (mathematics); DNA sequence basecalling; Sanger sequencing method; blind basecalling experiment; machine learning; neural nets; trace peak behaviour decoding; tree classifiers; Bioinformatics; Computer vision; DNA; Decoding; Digital filters; Error analysis; Genomics; Machine learning; Sequences; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on
  • Conference_Location
    Toronto, Ont.
  • Print_ISBN
    1-4244-0623-4
  • Electronic_ISBN
    1-4244-0624-2
  • Type

    conf

  • DOI
    10.1109/CIBCB.2006.330992
  • Filename
    4133174