DocumentCode
3717138
Title
Evaluating cloud frameworks on genomic applications
Author
Michele Bertoni;Stefano Ceri;Abdulrahman Kaitoua;Pietro Pinoli
Author_Institution
Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italia
fYear
2015
Firstpage
193
Lastpage
202
Abstract
We are developing a new, holistic data management system for genomics, which uses cloud-based computing for querying thousands of heterogeneous genomic datasets. In our project, it is essential to leverage upon a modern cloud computing framework, so as to encode our query expressions into high-level operations provided by the framework. After releasing our first implementation using Pig and Hadoop 1, we are currently targeting Spark and Flink, two emerging frameworks for general-purpose big data analytics. While Spark appears to have a stronger critical mass, Flink supports high-level optimization for data management operations; both systems appear suited to support our domain-specific data management operations. In this paper, we focus on a comparison of the two frameworks at work based upon three typical genomic applications, stemming from our data management requirements and needs; we describe the coding of the genomic applications using Flink and Spark, discuss their common aspects and differences, and comparatively evaluate the performance and scalability of the implementations over datasets consisting of billions of genomic regions.
Keywords
"Genomics","Bioinformatics","Sparks","Big data","Cloud computing","Encoding","DNA"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7363756
Filename
7363756
Link To Document