DocumentCode :
1668552
Title :
Trident: technology-scalable architecture for data parallel applications
Author :
Sedukhin, Stanislav G. ; Soliman, Mostafa I.
Author_Institution :
Graduate Sch. of Comput. Sci. & Eng., Univ. of Aizu, Fukushima, Japan
fYear :
2003
Abstract :
Within the current decade, process technology is promising more than one billion transistors on a single die, operating at frequency more than 10 GHz. We proposed the Trident processor, which uses multi-level ISA to express data parallelism to hardware. Trident is scalable because its architecture is regular, which can be widely replicated to efficiently harness the available transistor budget. Besides, it is based on local communication, which is very suitable for a high operating frequency of the future VLSI technology. This paper discusses the Trident processor architecture and evaluates its performance on the Basic Linear Algebra Subprograms (BLAS), which are widely used in many data parallel applications. The TFLOPS rate on infinite-size problems (R), which is primarily a characteristic of the computer technology, and the problem size needed to reach one-half of R (N12/), which is a measure of the amount of parallelism in a computer architecture, are used to evaluate the performance of the Trident processor on BLAS. On 128 parallel Trident lanes and 10 GHz operating frequency, which are possible in the billion-transistor era, R of dot product, matrix-vector, and matrix-matrix multiplications are 1.1, 1.8, and 2.5 TFLOPS, respectively. Besides, N12/ increases when switching from low level to high level of BLAS.
Keywords :
matrix multiplication; parallel architectures; parallel programming; performance evaluation; vectors; 1.1 TFLOPS; 1.8 TFLOPS; 10 GHz; 2.5 TFLOPS; BLAS; Basic Linear Algebra Subprograms; Trident; VLSI; computer architecture; data parallel applications; dot product multiplication; local communication; matrix-matrix multiplication; matrix-vector multiplication; multi-level ISA; parallel architecture; performance; technology-scalable architecture; Application software; Computer architecture; Concurrent computing; Frequency; Hardware; Instruction sets; Linear algebra; Parallel processing; Size measurement; Very large scale integration;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
ISSN :
1530-2075
Print_ISBN :
0-7695-1926-1
Type :
conf
DOI :
10.1109/IPDPS.2003.1213477
Filename :
1213477
Link To Document :
بازگشت