• DocumentCode
    2737769
  • Title

    CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platforms

  • Author

    Baratloo, Arash ; Dasgupta, Partha ; Kedem, Zvi M.

  • Author_Institution
    New York Univ., NY, USA
  • fYear
    1995
  • fDate
    2-4 Aug 1995
  • Firstpage
    122
  • Lastpage
    129
  • Abstract
    The importance of adapting networks of workstations for use as parallel processing platforms is well established. However current solutions do not always address important issues that exist in real networks. External factors like the sharing of resources, unpredictable behavior of the network and failures, are present in multiuser networks and must be addressed. CALYPSO is a prototype software system for writing and executing parallel programs on non-dedicated platforms, based on COTS networked workstations operating systems, and compilers. Among notable properties of the system are: (1) simple programming paradigm incorporating shared memory constructs and separating the programming and the execution parallelism, (2) transparent utilization of unreliable shared resources by providing dynamic load balancing and fault tolerance, and (3) effective performance for large classes of coarse-grained computations. We present the system and report our initial experiments and performance results in settings that closely resemble the dynamic behavior of a “real” network. Under varying work-load conditions, resource availability and process failures, the efficiency of the test program we present ranged from 84% to 94% bench-marked against a sequential program
  • Keywords
    fault tolerant computing; network operating systems; parallel processing; parallel programming; program compilers; CALYPSO; COTS networked workstations operating systems; coarse-grained computations; compilers; distributed platforms; dynamic load balancing; fault-tolerant parallel processing; multiuser networks; parallel programs; process failures; prototype software system; resource availability; shared memory constructs; software system; Dynamic programming; Fault tolerant systems; Operating systems; Parallel processing; Parallel programming; Program processors; Software prototyping; Software systems; Workstations; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Distributed Computing, 1995., Proceedings of the Fourth IEEE International Symposium on
  • Conference_Location
    Washington, DC
  • ISSN
    1082-8907
  • Print_ISBN
    0-8186-7088-6
  • Type

    conf

  • DOI
    10.1109/HPDC.1995.518702
  • Filename
    518702