Title :
Fault tolerance of allocation schemes in massively parallel computers
Author :
Livingston, Marilynn ; Stout, Quentin F.
Author_Institution :
Dept. of Comput. Sci., Southern Illinois Univ., Edwardsville, IL, USA
Abstract :
The author examines the problem of locating and allocating large fault-free subsystems in multiuser massively parallel computer systems. Since the allocation schemes used in such large systems cannot allocate all possible subsystems a reduction in fault tolerance is experienced. The effects of different allocation methods, including the buddy and Gray-coded buddy schemes for the allocation of subsystems in the hypercube and in the two-dimensional mesh and torus are analyzed. Both worst-case and expected-case performance are studied. Generalizing the buddy and Gray-coded systems, a family of allocation schemes which exhibit a significant improvement in fault tolerance over the existing schemes and which use relatively few additional resources is introduced. For purposes of comparison, the behavior of the various schemes on the allocation of subsystems of 218 processors in the hypercube, mesh, and torus consisting of 220 processors is studied. The methods involve a combination of analytical techniques and simulation
Keywords :
fault tolerant computing; parallel machines; Gray-coded buddy schemes; allocation schemes; analytical techniques; buddy; expected-case performance; fault tolerance; hypercube; large fault-free subsystems; massively parallel computers; simulation; torus; two-dimensional mesh; Analytical models; Computational modeling; Computer science; Concurrent computing; Fault tolerance; Fault tolerant systems; Hypercubes; Multiprocessor interconnection networks; Parallel machines; Resource management;
Conference_Titel :
Frontiers of Massively Parallel Computation, 1988. Proceedings., 2nd Symposium on the Frontiers of
Conference_Location :
Fairfax, VA
Print_ISBN :
0-8186-5892-4
DOI :
10.1109/FMPC.1988.47483