Title :
N latency 2N I/O-bandwidth 2D-array matrix multiplication algorithm
Author :
Oudjida, A.K. ; Titri, S. ; Hamarlain, M.
Author_Institution :
Microelectron. & Robotics Labs., CDTA, Algiers, Algeria
fDate :
6/23/1905 12:00:00 AM
Abstract :
The emergence of the systolic paradigm in 1978 inspired the first 2D-array parallelization of the sequential matrix multiplication algorithm. Since then, and due to its attractive and appealing features, the systolic approach has been gaining momentum to the point where all 2D-array parallelization attempts were exclusively systolic. Latency has been successively reduced a number of times (5N, 3N, 2N, 3N/2), where N is the matrix size. But as latency gets lower, further irregularities were introduced into the array, making the implementation severely compromised either at VLSI level or at system level. The best illustrative case of such irregularities are the two designs proposed by Tsay and Chang in 1995 and considered as the fastest designs (3N/2) that have been developed so far. The purpose of this paper is twofold: we first demonstrate that N+√N/2 is the minimal latency that can be achieved using the systolic approach. Afterwards, we introduce a fully-parallel 2D-array algorithm with N latency and 2N I/O-bandwidth. This novel algorithm is not only the fastest algorithm, but is also the most regular one too
Keywords :
VLSI; matrix multiplication; parallel algorithms; systolic arrays; 2D-array parallelization; VLSI level; full-parallel algorithm; latency; sequential matrix multiplication algorithm; system level; systolic paradigm; Algorithm design and analysis; Delay; Iterative algorithms; Laboratories; Matrix decomposition; Microelectronics; Parallel robots; Systolic arrays; Tin; Very large scale integration;
Conference_Titel :
Electronics, Circuits and Systems, 2001. ICECS 2001. The 8th IEEE International Conference on
Print_ISBN :
0-7803-7057-0
DOI :
10.1109/ICECS.2001.957547