مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

191015

Title :

In search of perfect reads

Author :

Pal, Soumitra ; Aluru, Srinivas

Author_Institution :

Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Bombay, Mumbai, India

fYear :

2014

fDate :

2-4 June 2014

Firstpage :

Lastpage :

Abstract :

Continued advances in next generation short-read sequencing technologies are increasing throughput and read lengths, while driving down the error rates, for example within 1% for Illumina HiSeq reads. Moreover, the errors are not uniformly distributed in all reads, and a large percentage of reads are indeed error-free. Ability to predict such perfect reads can have significant impact on run-time complexity of applications. In this paper, we present a simple and fast k-spectrum analysis based method to identify error-free reads. Our experiments show that if around 80% of the reads in a dataset are perfect, then our method retains almost 99.9% of them with more than 90% precision rate. Though filtering out reads identified as erroneous by our method reduces the coverage by about 7% on an average, coverage pattern across genome remains similar. The filtration process can be customized at several levels of stringency depending upon the downstream application need.

Keywords :

error analysis; filtration; genomics; error-free read identification; fast k-spectrum analysis based method; filtration process; genomics; next generation short-read sequencing technologies; Accuracy; Bioinformatics; Error correction; Genomics; Next generation networking; Prediction algorithms; Sequential analysis; Next generation sequencing; error correction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computational Advances in Bio and Medical Sciences (ICCABS), 2014 IEEE 4th International Conference on

Conference_Location :

Miami, FL

Print_ISBN :

978-1-4799-5786-6

Type :

conf

DOI :

10.1109/ICCABS.2014.6863919

Filename :

6863919

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=191015