DocumentCode :
3717138
Title :
Evaluating cloud frameworks on genomic applications
Author :
Michele Bertoni;Stefano Ceri;Abdulrahman Kaitoua;Pietro Pinoli
Author_Institution :
Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italia
fYear :
2015
Firstpage :
193
Lastpage :
202
Abstract :
We are developing a new, holistic data management system for genomics, which uses cloud-based computing for querying thousands of heterogeneous genomic datasets. In our project, it is essential to leverage upon a modern cloud computing framework, so as to encode our query expressions into high-level operations provided by the framework. After releasing our first implementation using Pig and Hadoop 1, we are currently targeting Spark and Flink, two emerging frameworks for general-purpose big data analytics. While Spark appears to have a stronger critical mass, Flink supports high-level optimization for data management operations; both systems appear suited to support our domain-specific data management operations. In this paper, we focus on a comparison of the two frameworks at work based upon three typical genomic applications, stemming from our data management requirements and needs; we describe the coding of the genomic applications using Flink and Spark, discuss their common aspects and differences, and comparatively evaluate the performance and scalability of the implementations over datasets consisting of billions of genomic regions.
Keywords :
"Genomics","Bioinformatics","Sparks","Big data","Cloud computing","Encoding","DNA"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363756
Filename :
7363756
Link To Document :
بازگشت