DocumentCode :
1785238
Title :
Whole cancer genome analysis using an I/O aware job scheduler on high performance computing resource
Author :
Junehawk Lee ; Hyojin Kang ; Seokjong Yu ; Chul Kim ; Sang-Jun Yea
Author_Institution :
Nat. Inst. of Supercomput. &Networking, Korea Inst. of Sci. & Technol. Inf., Daejeon, South Korea
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
10
Lastpage :
11
Abstract :
Recent advances in DNA sequencing technology have enabled Next Generation Sequencing (NGS) instruments to accelerate generating billions of DNA reads in a few days. However, the management of enormous NGS data and the concurrent analysis of these vast amount of data requires a great deal of computing power and memory as well as huge disk storage. Current popular job scheduling systems provide efficient ways for managing and scheduling vast amount of analysis based on the available computing resources but don´t consider the maximum amount of Input/Output (I/O). Thus, when executing large number of genome analysis on a large scale cluster system, the maximum bandwidth of storage I/O is insufficient to utilize all computing resources so the analysis jobs are frequently suspended. Here we developed a disk I/O aware job submission scheduler to maximize disk I/O usage but not hampering previously running jobs due to the heavy disk I/O of a new job. And we constructed a cancer genome analysis pipeline by using our I/O aware scheduler and HPC resources in National Institute of Supercomputing and Networking (NISN) to overcome the obstacles of concurrent analysis for vast amount of NGS data. Based on our I/O aware job submission scheduler, we performed major genome analyses on over 50 case-control pairs of chromophobe renal cell carcinoma patients whole genome samples sequenced by The Cancer Genome Atlas (TCGA) and successfully completed all analysis jobs while maintaining no jobs to be suspended by I/O bottleneck.
Keywords :
DNA; cancer; disc storage; genomics; medical computing; molecular biophysics; molecular configurations; parallel processing; processor scheduling; storage management; DNA sequencing technology; HPC resources; I/O aware job submission scheduler; NGS instruments; TCGA; The Cancer Genome Atlas; chromophobe renal cell carcinoma patients; cluster system; disk storage; high-performance computing resource; next generation sequencing instruments; whole cancer genome analysis; Bandwidth; Bioinformatics; Cancer; DNA; Genomics; Processor scheduling; Sequential analysis; High performance computing; I/O; Lustre; NFS; Whole genome sequencing; job scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999391
Filename :
6999391
Link To Document :
بازگشت