• DocumentCode
    2174332
  • Title

    Using Virtual Clusters to Decouple Computation and Data Management in High Throughput Analysis Applications

  • Author

    Leo, Simone ; Anedda, Paolo ; Gaggero, Massimo ; Zanetti, Gianluigi

  • Author_Institution
    CRS4, Pula, Italy
  • fYear
    2010
  • fDate
    17-19 Feb. 2010
  • Firstpage
    411
  • Lastpage
    415
  • Abstract
    The rapid growth in the throughput to cost ratio of experimental data production technologies is generating vast amounts of scientific data, often organized into "large" objects (genomes, bio-images) exhibiting complex internal structures. Frequently, datasets must be shared between multiple research groups interested not only in the final results, but also in how they are produced. The practical difficulties of moving terabytes or more of data across the network, as well as the need to maintain a clear separation between software stack and storage infrastructure, are thus raising interest in the use of virtual clusters for HPC and data intensive applications. In this paper we employ a MapReduce implementation of an image analysis pipeline used by deep sequencing platforms to analyse different virtual cluster scenarios and their impact on system performance.
  • Keywords
    database management systems; distributed processing; virtual machines; workstation clusters; HPC application; MapReduce; data intensive application; data management; dataset; decouple computation; experimental data production technology; high throughput analysis; image analysis; large object; software stack; storage infrastructure; system performance; throughput-to-cost ratio; virtual clusters; Application software; Bioinformatics; Costs; Genomics; Image analysis; Image sequence analysis; Pipelines; Production; Software maintenance; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1066-6192
  • Print_ISBN
    978-1-4244-5672-7
  • Electronic_ISBN
    1066-6192
  • Type

    conf

  • DOI
    10.1109/PDP.2010.29
  • Filename
    5452437