Title :
Practical software for big genomics data
Author_Institution :
Dept. of Comput. Sci., Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
Second-generation DNA sequencers provide an inexpensive and high-resolution window on diverse aspects of biology, genetics, and human disease. In recent years, improvements in per-instrument sequencing throughput have far outpaced improvements in computer speed. This necessitates a computer-science counterattack on two fronts: (1) faster algorithms that make better use of a fixed amount of compute power, and (2) scalable algorithms that make the best possible use of large collections of computers. Here I will discuss past work on both these fronts, concentrating on the cloud-enabled, scalable software pipelines Crossbow and Myrna. Crossbow is a cloud-enabled tool for aligning short second-generation sequence reads and calling SNP variants. Myrna is a cloud-enabled tool for aligning second-generation sequence reads from two groups (e.g. cancer and normal) and detecting which genes are differentially expressed between the groups. Myrna is an example of how scalable software tools can be used to derive new scientific results (in this case, about the usefulness of certain statistical models for differential expression) from large, already-published datasets. This is further exemplified by ReCount, a database of pre-processed RNA sequencing datasets from 18 different published studies comprising 475 samples and over 8 billion reads.
Keywords :
DNA; biology computing; cancer; cloud computing; genetics; genomics; molecular biophysics; molecular configurations; statistical analysis; Crossbow; Myrna; RNA sequencing dataset preprocessing; ReCount; big genomics data; biology; cancer; cloud-enabled scalable software pipelines; computer science; computer speed; database; differential expression; gene detection; genetics; high-resolution window; human disease; per-instrument sequencing throughput; practical software; second-generation DNA sequencers; statistical models; Bioinformatics; Computer science; Computers; Educational institutions; Genomics; Software; Software algorithms;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2013 IEEE 3rd International Conference on
Conference_Location :
New Orleans, LA
DOI :
10.1109/ICCABS.2013.6629241