• DocumentCode
    1783215
  • Title

    CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination

  • Author

    Dorier, Matthieu ; Antoniu, Gabriel ; Ross, Robert ; Kimpe, Dries ; Ibrahim, Shadi

  • Author_Institution
    ENS Cachan Brittany, IRISA, Rennes, France
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    155
  • Lastpage
    164
  • Abstract
    Unmatched computation and storage performance in new HPC systems have led to a plethora of I/O optimizations ranging from application-side collective I/O to network and disk-level request scheduling on the file system side. As we deal with ever larger machines, the interference produced by multiple applications accessing a shared parallel file system in a concurrent manner becomes a major problem. Interference often breaks single-application I/O optimizations, dramatically degrading application I/O performance and, as a result, lowering machine wide efficiency. This paper focuses on CALCioM, a framework that aims to mitigate I/O interference through the dynamic selection of appropriate scheduling policies. CALCioM allows several applications running on a supercomputer to communicate and coordinate their I/O strategy in order to avoid interfering with one another. In this work, we examine four I/O strategies that can be accommodated in this framework: serializing, interrupting, interfering and coordinating. Experiments on Argonne´s BG/P Surveyor machine and on several clusters of the French Grid´5000 show how CALCioM can be used to efficiently and transparently improve the scheduling strategy between two otherwise interfering applications, given specified metrics of machine wide efficiency.
  • Keywords
    input-output programs; parallel processing; scheduling; Argonne BG-P surveyor machine; CALCioM; French Grid´5000; HPC systems; IO interference mitigation; IO optimizations; application-side collective IO; cross-application coordination; disk-level request scheduling; file system side; network-level request scheduling; scheduling policies; scheduling strategy; shared parallel file system; storage performance; unmatched computation performance; Dynamic scheduling; Interference; Measurement; Optimization; Servers; Supercomputers; Throughput; CALCioM; Cross-Application Contention; I/O; Interference; Parallel File Systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.27
  • Filename
    6877251