• DocumentCode
    244623
  • Title

    An automated infrastructure to support high-throughput bioinformatics

  • Author

    Cuccuru, Gianmauro ; Leo, Simone ; Lianas, Luca ; Muggiri, Michele ; Pinna, Andrea ; Pireddu, Luca ; Uva, Paolo ; Angius, Alessio ; Fotia, Giorgio ; Zanetti, Gianluigi

  • Author_Institution
    CRS4, Pula, Italy
  • fYear
    2014
  • fDate
    21-25 July 2014
  • Firstpage
    600
  • Lastpage
    607
  • Abstract
    The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
  • Keywords
    Big Data; bioinformatics; CRS4 next generation sequencing facility; automated infrastructure; big data; building analysis frameworks; data repositories; data transformation process; error control; high throughput DNA sequencers; high throughput bioinformatics; massive data producers; open source tools; raw sequencer output; reproducibility; usability; Bioinformatics; Genomics; Muscles; Simple object access protocol; Bioinformatics; MapReduce; NGS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing & Simulation (HPCS), 2014 International Conference on
  • Conference_Location
    Bologna
  • Print_ISBN
    978-1-4799-5312-7
  • Type

    conf

  • DOI
    10.1109/HPCSim.2014.6903742
  • Filename
    6903742