DocumentCode :
2039210
Title :
Improving the flexibility of RNA-Seq data analysis pipelines
Author :
Phan, John H. ; Po-Yen Wu ; Wang, May Dongmei
Author_Institution :
Dept. of Biomed. Eng., Georgia Inst. of Technol. & Emory Univ., Atlanta, GA, USA
fYear :
2012
fDate :
2-4 Dec. 2012
Firstpage :
70
Lastpage :
73
Abstract :
Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline, assess its potential for novel isoform discovery and validate its quantification performance using qRT-PCR.
Keywords :
RNA; biology computing; data analysis; enzymes; genetics; genomics; molecular biophysics; molecular configurations; RNA-Seq data analysis pipeline flexibility; RNA-Seq quantification process; gene expression; gene quantification; isoform expression; isoforms quantification; novel isoform discovery; per-dataset basis; reference genome mapping stages; reverse transcription polymerase chain reaction; single mapping algorithm; standard SAM-BAM format; transcriptome mapping; Next-generation sequencing; RNA-seq; data analysis pipeline; gene expression; spliced mapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genomic Signal Processing and Statistics, (GENSIPS), 2012 IEEE International Workshop on
Conference_Location :
Washington, DC
ISSN :
2150-3001
Print_ISBN :
978-1-4673-5234-5
Type :
conf
DOI :
10.1109/GENSIPS.2012.6507729
Filename :
6507729
Link To Document :
بازگشت