DocumentCode
1705561
Title
Decoupled vector architectures
Author
Espasa, Roger ; Valero, Mateo
Author_Institution
Dept. d´´Arquitectura de Computadors, Univ. Politecnica de Catalunya, Barcelona, Spain
fYear
1996
Firstpage
281
Lastpage
290
Abstract
The purpose of this paper is to show that using decoupling techniques in a vector processor, the performance of vector programs can be greatly improved. Using a trace driven approach, we simulate a selection of the Perfect Club programs and compare their execution time on a conventional vector architecture and on a decoupled vector architecture. Decoupling provides a performance advantage of more than a factor of two for realistic memory latencies, and even with an ideal memory system with no latency, there is still a speedup of as much as 50%. A bypassing technique between the load/store queues is introduced and we show how it can give up to an extra speedup of 22% while also reducing total memory traffic by an average of 20%. An important part of this paper is devoted to study the tradeoffs involved in choosing an adequate size for the different queues of the architecture, so that the hardware cost of the queues can be minimized while still retaining most of the performance advantages of decoupling
Keywords
performance evaluation; vector processor systems; Perfect Club programs; bypassing technique; decoupled vector architectures; hardware cost; performance; performance advantages; realistic memory latencies; total memory traffic; trace driven approach; vector processor; Computational modeling; Computer aided instruction; Computer architecture; Costs; Delay; Hardware; Multithreading; Parallel processing; Vector processors; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
High-Performance Computer Architecture, 1996. Proceedings., Second International Symposium on
Conference_Location
San Jose, CA
Print_ISBN
0-8186-7237-4
Type
conf
DOI
10.1109/HPCA.1996.501193
Filename
501193
Link To Document