• DocumentCode
    1984397
  • Title

    Scalable and Resilient Workflow Executions on Production Distributed Computing Infrastructures

  • Author

    Balderrama, Javier Rojas ; Huu, Tram Truong ; Montagnat, Johan

  • Author_Institution
    I3S Lab., Univ. of Nice-Sophia Antipolis, Nice, France
  • fYear
    2012
  • fDate
    25-29 June 2012
  • Firstpage
    119
  • Lastpage
    126
  • Abstract
    In spite of the growing interest for grids and cloud infrastructures among scientific communities and the availability of such facilities at large-scale, achieving high performance in production environments remains challenging due to at least four factors: the low reliability of very large-scale distributed computing infrastructures, the performance overhead induced by shared facilities, the difficulty to obtain fair balance of all user jobs in such an heterogeneous environment, and the complexity of large-scale distributed applications deployment. All together, these difficulties make infrastructure exploitation complex, and often limited to experts. This paper introduces a pragmatic solution to tackle these four issues based on a service-oriented methodology, the reuse of existing middleware services, and the joint exploitation of local and distributed computing resources. Emphasis is put on the integrated environment ease of use. Results on an actual neuroscience application show the impact of the environment setup in terms of reliability and performance. Recommendations and best practices are derived from this experiment.
  • Keywords
    cloud computing; grid computing; middleware; natural sciences computing; service-oriented architecture; software performance evaluation; cloud infrastructures; grid infrastructures; middleware services; neuroscience application; performance overhead; pragmatic solution; production distributed computing infrastructures; resilient workflow executions; scalable workflow executions; scientific communities; service-oriented methodology; shared facilities; Diseases; Production; Reliability; Servers; Service oriented architecture; Distributed Computing Infrastructure; Grid Computing; Scientific Workflow; Service Oriented Architecture;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing (ISPDC), 2012 11th International Symposium on
  • Conference_Location
    Munich/Garching, Bavaria
  • Print_ISBN
    978-1-4673-2599-8
  • Type

    conf

  • DOI
    10.1109/ISPDC.2012.24
  • Filename
    6341502