The reconfigurable ring of processors: fine-grained tree-structured computations

Author

Rosenberg, Arnold L. ; Scarano, Vittorio ; Sitaraman, Ramesh K.

Author_Institution

Dept. of Comput. Sci., Massachusetts Univ., Amherst, MA, USA

fYear

1994

fDate

26-29 Oct 1994

Firstpage

470

Lastpage

477

Abstract

We study fine-grained parallel computation on a reconfigurable ring of processors (denoted by ℛℛ𝒫). The ring of processors is endowed with a very flexible reconfigurable bus. The bus has some number of lines, each having one-packet width, that can be configured to establish arbitrary point-to-point connections independently for each line. We assume that the ℛℛ𝒫s we study have been implemented so that the latency for transmitting messages is logarithmic in the number of processors the message passes over in transit. We present an algorithm that allows an N-processor ℛℛ𝒫 with w lines to perform the broadcast operation (and any “leveled tree-structured” computation like parallel prefix) in time at most (log² N/log w)+log N log log w. We prove that this algorithm´s performance can be improved by at most a constant factor, both when the buswidth w is “small”, so that the first term dominates, and when w is “large”, so that the second term dominates. Further, we expose a fundamental, architecture-independent limitation imposed by the logarithmic communication latency model: we prove that for a broad range of parallel architectures, including any N-processor ℛℛ𝒫, any operation that requires one processor to receive information-directly or indirectly-from all other processors, requires time proportional to log N log log N

Keywords

message passing; multiprocessor interconnection networks; parallel algorithms; parallel architectures; performance evaluation; reconfigurable architectures; system buses; trees (mathematics); algorithm performance; broadcast operation; buswidth; fine-grained tree-structured computation; flexible reconfigurable bus; logarithmic; logarithmic communication latency model; message transmission; one-packet width; parallel architectures; parallel computation; parallel prefix; point-to-point connections; reconfigurable processor ring; tree-structured; Broadcasting; Computer architecture; Computer science; Concurrent computing; Costs; Delay; Parallel architectures; Semiconductor device modeling;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing, 1994. Proceedings. Sixth IEEE Symposium on

Conference_Location

Dallas, TX

Print_ISBN

0-8186-6427-4

Type

conf

DOI

10.1109/SPDP.1994.346133

Filename

346133