• DocumentCode
    1854074
  • Title

    Checkpointing Process Groups in a Grid Environment

  • Author

    Mehnert-Spahn, John ; Schottner, Michael ; Morin, Christine

  • Author_Institution
    Dept. of Comput. Sci., Heinrich-Heine Univ., Duesseldorf
  • fYear
    2008
  • fDate
    1-4 Dec. 2008
  • Firstpage
    243
  • Lastpage
    251
  • Abstract
    The EU-funded XtreemOS project implements a grid operating system transparently exploiting resources of virtual organizations through the standard POSIX interface. Grid checkpointing and restart requires to save and restore jobs executing in a distributed heterogeneous grid environment. The latter may spawn millions of grid nodes ( PCs, clusters, and mobile devices ) using different system-specific checkpointers saving and restoring application and kernel data structures for processes executing on a grid node. In this paper we shortly describe the XtreemOS grid checkpointing architecture and how we bridge the gap between the abstract grid and the system-specific checkpointers. Then we discuss how we keep track of processes and how different process grouping techniques are managed to ensure that all processes of a job and any further dependent ones can be checkpointed and restarted. Finally, we present how Linux control groups can be used to address resource isolation issues during the restart.
  • Keywords
    checkpointing; data structures; grid computing; software architecture; Linux control groups; POSIX interface; XtreemOS grid checkpointing architecture; checkpointing process; distributed heterogeneous grid environment; kernel data structures; resource isolation; virtual organizations; Checkpointing; Computer science; Kernel; Linux; Middleware; Operating systems; Personal communication networks; Power system management; Power system security; Resource management; fault tolerance; grid computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Computing, Applications and Technologies, 2008. PDCAT 2008. Ninth International Conference on
  • Conference_Location
    Otago
  • Print_ISBN
    978-0-7695-3443-5
  • Type

    conf

  • DOI
    10.1109/PDCAT.2008.14
  • Filename
    4710987