• DocumentCode
    1991064
  • Title

    Checkpoint/rollback in a distributed system using coarse-grained dataflow

  • Author

    Cummings, D. ; Alkalaj, L.

  • Author_Institution
    Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
  • fYear
    1994
  • fDate
    15-17 June 1994
  • Firstpage
    424
  • Lastpage
    433
  • Abstract
    The Common Spaceborne Multicomputer Operating System (COSMOS) is a spacecraft operating system for distributed memory multiprocessors, designed to meet the on-board computing requirements of long-life interplanetary missions. One of the main features of COSMOS is software-implemented fault-tolerance, including 2-way voting, 3-way voting, and check point/rollback. This paper describes the COSMOS distributed checkpoint/rollback approach, which exploits the fact that a COSMOS application program is based on a coarse-grained dataflow programming paradigm and therefore most of the state of a distributed application program is contained in the data tokens. Furthermore, all computers maintain a consistent view of this dynamic state, which facilitates the implementation of a coordinated checkpoint.<>
  • Keywords
    aerospace computing; concurrency control; distributed memory systems; fault tolerant computing; operating systems (computers); parallel processing; software reliability; space vehicles; 2-way voting; 3-way voting; COSMOS; Common Spaceborne Multicomputer Operating System; checkpoint; coarse-grained dataflow; coarse-grained dataflow programming paradigm; coordinated checkpoint; data tokens; distributed application program; distributed memory multiprocessors; distributed system; long-life interplanetary missions; on-board computing requirements; rollback; software-implemented fault-tolerance; spacecraft operating system; Distributed computing; Fault tolerance; Operating systems; Orbital robotics; Propulsion; Real time systems; Robot kinematics; Space technology; Space vehicles; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fault-Tolerant Computing, 1994. FTCS-24. Digest of Papers., Twenty-Fourth International Symposium on
  • Conference_Location
    Austin, TX, USA
  • Print_ISBN
    0-8186-5520-8
  • Type

    conf

  • DOI
    10.1109/FTCS.1994.315619
  • Filename
    315619