Title :
Effectiveness of register preloading on CP-PACS node processor
Author :
Nakamura, Hiroshi ; Itakura, Ken´ichi ; Matsubara, Masazumi ; Boku, Taisuke ; Nakazawa, Kisaburo
Author_Institution :
Res. Center for Adv. Sci. & Technol., Tokyo Univ., Japan
Abstract :
CP-PACS is a massively parallel processor (MPP) for large scale scientific computations. On September 1996, CP-PACS equipped with 2048 processors began its operation at University of Tsukuba. At that time, CP-PACS was the fastest MPP in the world on LINPACK benchmark. CP-PACS was designed to achieve very high performance in large scientific/engineering applications. A is well known that ordinary data cache is not effective in such applications because data size is much larger than cache size and because there is little temporal locality. Thus, a special mechanism for hiding long memory access latency is indispensable. Cache prefetching is a well-known technique for this purpose. In addition to cache prefetching, CP-PACS node processors implement register preloading mechanism. This mechanism enables the processor to transfer required floating-point data directly (not via data cache) between main memory and floating-point registers in pipelined way. We compare register preloading with cache prefetching by measuring real performance of CP-PACS processor and HP PA-8000 processor which implement cache prefetching and/or register preloading
Keywords :
cache storage; parallel machines; performance evaluation; storage management; CP-PACS; HP PA-8000; cache prefetching; long memory access latency; massively parallel processor; performance; register preloading; Application software; Computer science; Concurrent computing; Delay; Large-scale systems; Physics computing; Prefetching; Registers; Throughput;
Conference_Titel :
Innovative Architecture for Future Generation High-Performance Processors and Systems, 1997
Conference_Location :
Maui, HI
Print_ISBN :
0-8186-8424-0
DOI :
10.1109/IWIA.1997.670412