• DocumentCode
    168644
  • Title

    A Flexible Framework for Asynchronous in Situ and in Transit Analytics for Scientific Simulations

  • Author

    Dreher, Matthieu ; Raffin, Bruno

  • Author_Institution
    LIG, INRIA, Montbonnot, France
  • fYear
    2014
  • fDate
    26-29 May 2014
  • Firstpage
    277
  • Lastpage
    286
  • Abstract
    High performance computing systems are today composed of tens of thousands of processors and deep memory hierarchies. The next generation of machines will further increase the unbalance between I/O capabilities and processing power. To reduce the pressure on I/Os, the in situ analytics paradigm proposes to process the data as closely as possible to where and when the data are produced. Processing can be embedded in the simulation code, executed asynchronously on helper cores on the same nodes, or performed in transit on staging nodes dedicated to analytics. Today, software environnements as well as usage scenarios still need to be investigated before in situ analytics become a standard practice. In this paper we introduce a framework for designing, deploying and executing in situ scenarios. Based on a component model, the scientist designs analytics workflows by first developing processing components that are next assembled in a dataflow graph through a Python script. At runtime the graph is instantiated according to the execution context, the framework taking care of deploying the application on the target architecture and coordinating the analytics workflows with the simulation execution. Component coordination, zero-copy intra-node communications or inter-nodes data transfers rely on per-node distributed daemons. We evaluate various scenarios performing in situ and in transit analytics on large molecular dynamics systems simulated with Gromacs using up to 2048 cores. We show in particular that analytics processing can be performed on the fraction of resources the simulation does not use well, resulting in a limited impact on the simulation performance (less than 9%). Our more advanced scenario combines in situ and in transit processing to compute a molecular surface based on the Quick surf algorithm.
  • Keywords
    data analysis; electronic data interchange; input-output programs; molecular dynamics method; natural sciences computing; parallel processing; workflow management software; Gromacs; I/O capability; Python script; Quick surf algorithm; analytics workflows; asynchronous in situ analytics; asynchronous in transit analytics; component coordination; dataflow graph; deep memory hierarchy; high performance computing system; internodes data transfers; molecular dynamics system; per-node distributed daemon; processing components; scientific simulation; simulation code; simulation execution; simulation performance; software environments; staging nodes; target architecture; zero-copy intranode communication; Analytical models; Computational modeling; Data models; Data visualization; Numerical models; Ports (Computers); Standards; IO; Molecular Dynamics; n Situ Analytics and Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/CCGrid.2014.92
  • Filename
    6846463