• DocumentCode
    3717138
  • Title

    Evaluating cloud frameworks on genomic applications

  • Author

    Michele Bertoni;Stefano Ceri;Abdulrahman Kaitoua;Pietro Pinoli

  • Author_Institution
    Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italia
  • fYear
    2015
  • Firstpage
    193
  • Lastpage
    202
  • Abstract
    We are developing a new, holistic data management system for genomics, which uses cloud-based computing for querying thousands of heterogeneous genomic datasets. In our project, it is essential to leverage upon a modern cloud computing framework, so as to encode our query expressions into high-level operations provided by the framework. After releasing our first implementation using Pig and Hadoop 1, we are currently targeting Spark and Flink, two emerging frameworks for general-purpose big data analytics. While Spark appears to have a stronger critical mass, Flink supports high-level optimization for data management operations; both systems appear suited to support our domain-specific data management operations. In this paper, we focus on a comparison of the two frameworks at work based upon three typical genomic applications, stemming from our data management requirements and needs; we describe the coding of the genomic applications using Flink and Spark, discuss their common aspects and differences, and comparatively evaluate the performance and scalability of the implementations over datasets consisting of billions of genomic regions.
  • Keywords
    "Genomics","Bioinformatics","Sparks","Big data","Cloud computing","Encoding","DNA"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363756
  • Filename
    7363756