Title :
Memory efficient assembly of human genome
Author :
Hormozdiari, Farhad ; Eskin, Eleazar
Author_Institution :
Dept. of Comp. Sci., Univ. of California Los Angeles, Los Angeles, CA, USA
Abstract :
Summary form only given. The ability to detect the genetic variations between two individuals is an essential component for genetic studies. In these studies obtaining the genome sequence of both individuals is the first step towards variation detection problem. The emergence of high-throughput sequencing (HTS) technology has made DNA sequencing practical, and is widely used by diagnosticians to increase their knowledge about the casual factor in genetic related diseases. As HTS advances, more data are generated every day than the amount that scientists can process. Genome assembly is one of the existing methods to tackle the variation detection problem. The de Bruijn graph formulation of the assembly problem is widely used in the field; furthermore, it is the only method which can assemble any genome in linear time. However, it requires an enormous amount of memory in order to assemble any mammalian size genome. The high demands of sequencing more individuals and the urge to assemble them are the driving forces for a memory efficient assembler. In this work we propose a novel method which builds the de Bruijn graph while consuming lower memory. Moreover, our proposed method can reduce the memory usage by 37% compared to the existing methods. In addition, we used a real data set (chromosome 17 of A/J strain) to illustrate the performance of our method.
Keywords :
DNA; genetics; genomics; graph theory; medical computing; molecular biophysics; storage management; A-J strain chromosome 17; DNA sequencing; HTS technology; de Bruijn graph formulation; diagnostician method; genetic related disease casual factor; genetic study; genetic variation detection; genome assembly problem; genome sequence; high-throughput sequencing technology; human genome; linear time genome assembly; mammalian size genome assembly; memory efficient assembler; memory efficient assembly; memory requirement; memory usage reduction; variation detection problem; Assembly; Bioinformatics; DNA; Genomics; High-temperature superconductors; Sequential analysis; De Bruijn graph; Genome assembly; High-throughput sequencing;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2013 IEEE 3rd International Conference on
Conference_Location :
New Orleans, LA
DOI :
10.1109/ICCABS.2013.6629227