Title :
Evaluation of a memory hierarchy for the MTS multithreaded processor
Author :
Gallagher, W. Lynn ; Wu, Chuan-Lin
Author_Institution :
Dept. of Electr. & Comput. Eng., Texas Univ., Austin, TX, USA
Abstract :
Executing multiple threads simultaneously on superscalar processor can improve hardware resource utilization and instruction throughput. The Multi-Threaded Superscalar (MTS) processor efficiently achieves the concurrent execution of multiple instruction streams using a VLIW, multiple functional unit architecture. However, the limitations of the memory system may impede the potential performance of the MTS processor. An interactive, parameter-driven simulator of the MTS architecture was developed using SES/workbench. A set of numerical benchmarks was run on it with varying memory system configurations. Assuming a single instruction cache, a single data cache, and one instruction queue per thread, varied parameters included the size of the instruction queues, the number of ports to the instruction cache, main memory latency, and cache hit rates. Based on simulation results, optimal values were chosen for certain parameters. To reasonably utilize the MTS processor, the memory system must provide at least 64 instruction bytes per cycle, though the demands for data access are far less severe. For reasonable memory speeds, this requires a roughly IMB three-ported instruction cache capable of providing 128 bits per port per cycle, as well as an IMB single-ported 32-bit wide data cache. A more realistic multilevel cache hierarchy is proposed
Keywords :
cache storage; discrete event simulation; instruction sets; parallel processing; program testing; MTS multithreaded processor; SES/workbench; VLIW; cache hit rates; hardware resource utilization; instruction cache; instruction throughput; main memory latency; memory hierarchy evaluation; memory system configurations; multiple functional unit architecture; multiple instruction streams; multithreaded superscalar processor; numerical benchmarks; parameter-driven simulator; realistic multilevel cache hierarchy; superscalar processor; Bandwidth; Computer architecture; Delay; Hardware; Impedance; Registers; Resource management; Throughput; VLIW; Yarn;
Conference_Titel :
Parallel and Distributed Systems, 1997. Proceedings., 1997 International Conference on
Conference_Location :
Seoul
Print_ISBN :
0-8186-8227-2
DOI :
10.1109/ICPADS.1997.652572