Title :
Network multicomputing using recoverable distributed shared memory
Author :
Carter, J.B. ; Cox, A.L. ; Dwarkadas, S. ; Elnozahy, E.N. ; Johnson, D.B. ; Keleher, P. ; Rodrigues, S. ; Yu, W. ; Zwaenepoel, W.
Author_Institution :
Dept. of Comput. Sci., Rice Univ., Houston, TX, USA
Abstract :
A network multicomputer is a multiprocessor in which the processors are connected by general-purpose networking technology, in contrast to current distributed memory multiprocessors where a dedicated special-purpose interconnect is used. The advent of high-speed general-purpose networks provides the impetus for a new look at the network multiprocessor model, by removing the bottleneck of current slow networks. However, major software issues remain unsolved. It is pointed out that a convenient machine abstraction must be developed that hides from the application programmer low-level details such as message passing or machine failures. Use is made of distributed shared memory as a programming abstraction, and rollback recovery through consistent checkpointing to provide fault tolerance. Measurements of the authors´ implementations of distributed shared memory and consistent checkpointing show that these abstractions can be implemented efficiently.<>
Keywords :
computer networks; distributed memory systems; fault tolerant computing; shared memory systems; user interfaces; application programmer; consistent checkpointing; distributed shared memory; fault tolerance; general-purpose networking technology; high-speed general-purpose networks; machine abstraction; machine failures; message passing; multiprocessor; network multicomputer; network multiprocessor model; programming abstraction; rollback recovery; software issues; Application software; Bandwidth; Checkpointing; Computer science; Hardware; Message passing; Multiprocessor interconnection networks; Programming profession; Supercomputers; Workstations;
Conference_Titel :
Compcon Spring '93, Digest of Papers.
Conference_Location :
San Francisco, CA, USA
Print_ISBN :
0-8186-3400-6
DOI :
10.1109/CMPCON.1993.289729