DocumentCode :
1785110
Title :
Fastq_clean: An optimized pipeline to clean the Illumina sequencing data with quality control
Author :
Mi Zhang ; Feng Zhan ; Honghe Sun ; Xiujun Gong ; Zhangjun Fei ; Shan Gao
Author_Institution :
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
44
Lastpage :
48
Abstract :
The usability of the NGS technologies heavily relies on the accuracy of the data. Many research groups developed different programs to clean the sequenced raw data by removing adapter contamination and trimming low-quality nucleotides. However, they are not optimized to process data from any specific equipment. In this study, we present an optimized pipeline Fastq_clean to clean the DNA-seq and RNA-seq data from the illumina sequencer. Fastq_clean can remove the low quality nucleotides and adapter contamination precisely and keep as many of the qualified nucleotides as possible. Fastq_clean can batchly process sequenced data and export statistics information for the data quality control (QC) by running a single command line. Compared with two most used tools on a published dataset, Fastq_clean reached the best performance. Fastq_clean has already been successfully used in some genome or transcriptome projects and it can also be used to clean the NGS data from other sequencers (e.g. 454), but needs some modification to reach the rest performance.
Keywords :
DNA; RNA; bioinformatics; electronic data interchange; genomics; molecular biophysics; molecular configurations; optimisation; pipeline processing; quality control; sequences; statistical analysis; 454 sequencer; DNA-seq data cleaning; Fastq_clean; Illumina sequencing data cleaning; NGS data cleaning; NGS technology usability; RNA-seq data cleaning; adapter contamination removal; batch sequenced data processing; command line; data QC; data accuracy; data processing optimization; data quality control; genome project; illumina sequencer; low quality nucleotide removal; low-quality nucleotide trimming; optimized pipeline; qualified nucleotide; sequenced raw data cleaning; statistics information exportation; transcriptome project; Bioinformatics; Cleaning; Contamination; Databases; Educational institutions; Genomics; Sequential analysis; NGS; data cleaning; data quality; deep sequencing; pipeline;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999309
Filename :
6999309
Link To Document :
بازگشت