• DocumentCode
    3141217
  • Title

    Implementation of a Scalable Next Generation Sequencing Business Cloud Platform--An Experience Report

  • Author

    Doddavula, Shyam Kumar ; Rani, Madhavi ; Sarkar, Santonu ; Vachhani, Harsh Rajesh ; Jain, Akansha ; Kaushik, Mudit ; Ghosh, Anirban

  • Author_Institution
    Infosys Labs., Bangalore, India
  • fYear
    2011
  • fDate
    4-9 July 2011
  • Firstpage
    598
  • Lastpage
    605
  • Abstract
    Life science industry is looking towards new and cost-effective ways to manage and analyze huge amount of genomic data for faster innovation in drug or biologics discovery. To that effect, various alliances among competitive organizations are getting formed, such as the Pistoia Alliance, to collaborate and share a pool of genomic data and build useful search and analysis techniques for the alliance partners. In order to make the development, and management of data and applications cost-effective, a secure cloud computing based platforms are being considered. In this paper we describe an experience report of building such a collaborative platform on Amazon cloud platform. In order to build a scalable genome sequence alignment solution, we have adopted the well-known BLAST framework on Hadoop platform. A major challenge here is that the BLAST executable requires to be ported as it is, and yet the execution needs to scale, as the number of jobs increases, by elastically growing the Hadoop infrastructure. In this paper we proposed a BLAST database partitioning solution to achieve optimal scalability. Our controlled experiment is encouraging, the empirical result shows that the job execution scales with the number of jobs, if the partition sizes are chosen appropriately.
  • Keywords
    biology computing; cloud computing; data handling; drugs; genomics; pharmaceutical industry; security of data; BLAST database partitioning solution; Hadoop platform; Pistoia Alliance; analysis techniques; biologics discovery; cloud computing based platform security; genome sequence alignment solution; genomic data; life science industry; scalable next generation sequencing business cloud platform; search techniques; Bioinformatics; Genomics; Indexes; Next generation networking; Organizations; Amazon Cloud; Application Porting; BLAST; Genomics; Hadoop; Next Generation Sequencing; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing (CLOUD), 2011 IEEE International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    2159-6182
  • Print_ISBN
    978-1-4577-0836-7
  • Electronic_ISBN
    2159-6182
  • Type

    conf

  • DOI
    10.1109/CLOUD.2011.60
  • Filename
    6008760