• DocumentCode
    1671766
  • Title

    Parallel processing on networks of workstations: a fault-tolerant, high performance approach

  • Author

    Dasgupta, Partha ; Kedem, Zvi M. ; Rabin, Michael O.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Arizona State Univ., Tempe, AZ, USA
  • fYear
    1995
  • Firstpage
    467
  • Lastpage
    474
  • Abstract
    One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf-workstations that actually deliver and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and dispersed data management it is possible to build an execution environment for parallel programs on workstation networks. These techniques were originally developed in a theoretical framework for an abstract machine which models a shared memory asynchronous multiprocessor. The network of workstations platform presents an inherently asynchronous environment for the execution of our parallel program. This gives rise to substantial problems of correctness of the computation and of proper automatic load balancing of the work amongst the processors, so that a slow processor will not hold up the total computation. A limiting case of asynchrony is when a processor becomes infinitely slow, i.e. fails. Our methodology copes with all these problems, as well as with memory failures. An interesting feature of this system is that it is neither a fault-tolerant system extended for parallel processing nor is it parallel processing system extended for fault tolerance. The same novel mechanisms ensure both properties
  • Keywords
    fault tolerant computing; parallel processing; parallel programming; performance evaluation; scheduling; abstract machine; automatic load balancing; correctness; dispersed data management; eager scheduling; evasive memory layouts; execution environment; fault-tolerant approach; high performance approach; memory failures; networks of workstations; parallel processing; parallel programs; shared memory asynchronous multiprocessor; Environmental management; Fault tolerance; Fault tolerant systems; Memory management; Parallel processing; Power system management; Power system reliability; Supercomputers; Technological innovation; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 1995., Proceedings of the 15th International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1063-6927
  • Print_ISBN
    0-8186-7025-8
  • Type

    conf

  • DOI
    10.1109/ICDCS.1995.500052
  • Filename
    500052