• DocumentCode
    2649740
  • Title

    High availability of the memory hierarchy in a cluster

  • Author

    Morin, Christine ; Lottiaux, Renaud ; Kermarrec, Anne-Marie

  • Author_Institution
    IRISA/INRIA, Campus Univ. de Beaulieu, Rennes, France
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    134
  • Lastpage
    143
  • Abstract
    A single-level store (SLS) integrating a shared virtual memory and a parallel file system with file mapping as its interface is attractive for the execution of high-performance applications in a cluster. However, the probability of a node reboot or failure is quite high. In this paper, we present the design of a highly available SLS system. Our approach combines checkpointing in memory and permanent checkpointing on disk in a cluster using all cluster memory and disk resources. Preliminary performance results show the applicability of the proposed approach for parallel applications with huge input/output requirements
  • Keywords
    parallel memories; performance evaluation; shared memory systems; system recovery; virtual storage; workstation clusters; cluster computing; cluster disk resources; cluster memory resources; file mapping interface; high-performance applications; highly available system; input/output requirements; memory checkpointing; memory hierarchy availability; node failure; node reboot; parallel applications; parallel file system; performance; permanent on-disk checkpointing; shared virtual memory; single-level store; Availability; Bandwidth; Bit error rate; Checkpointing; Fault tolerance; File systems; Laser sintering; Memory management; Microprocessors; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems, 2000. SRDS-2000. Proceedings The 19th IEEE Symposium on
  • Conference_Location
    Nurnberg
  • Print_ISBN
    0-7695-0543-0
  • Type

    conf

  • DOI
    10.1109/RELDI.2000.885401
  • Filename
    885401