Title :
Delivering bioinformatics MapReduce applications in the cloud
Author :
Forer, Lukas ; Lipic, Tomislav ; Schonherr, Sven ; Weisensteiner, Hansi ; Davidovic, Davor ; Kronenberg, Florian ; Afgan, Enis
Author_Institution :
Div. of Genetic Epidemiology, Med. Univ. of Innsbruck, Innsbruck, Austria
Abstract :
The ever-increasing data production and availability in the field of bioinformatics demands a paradigm shift towards the utilization of novel solutions for efficient data storage and processing, such as the MapReduce data parallel programming model and the corresponding Apache Hadoop framework. Despite the evident potential of this model and existence of already available algorithms and applications, especially for batch processing of large data sets as in the Next Generation Sequencing analysis, bioinformatics MapReduce applications are yet to become widely adopted in the bioinformatics data analysis. We identify two prerequisites for their adaptation and utilization: (1) the ability to compose complex workflows from multiple bioinformatics MapReduce tools that will abstract technical details of how those tools are combined and executed allowing bioinformatics domain experts to focus on the analysis, and (2) the availability of accessible and flexible computing infrastructure for this type of data processing. This paper presents integration of two existing systems: Cloudgene, a bioinformatics MapReduce workflow framework, and CloudMan, a cloud manager for delivering application execution environments. Together, they enable delivery of bioinformatics MapReduce applications in the Cloud.
Keywords :
batch processing (computers); bioinformatics; cloud computing; data analysis; parallel programming; Apache Hadoop framework; CloudMan; Cloudgene; MapReduce data parallel programming model; application execution environments; batch processing; bioinformatics MapReduce applications; bioinformatics MapReduce workflow framework; bioinformatics data analysis; cloud manager; data processing; data production; data storage; flexible computing infrastructure; large data sets; multiple bioinformatics MapReduce tools; next generation sequencing analysis; Bioinformatics; Biological system modeling; Cloud computing; Computational modeling; Data analysis; Genomics; Sequential analysis;
Conference_Titel :
Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on
Conference_Location :
Opatija
Print_ISBN :
978-953-233-081-6
DOI :
10.1109/MIPRO.2014.6859593