DocumentCode :
2265336
Title :
Global address space, non-uniform bandwidth: a memory system performance characterization of parallel systems
Author :
Stricker, T. ; Cross, Tom
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
1997
fDate :
1-5 Feb 1997
Firstpage :
168
Lastpage :
179
Abstract :
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly. Despite a uniform view of the memory, the machines differ significantly in their memory system performance (and may offer slightly different consistency models). Cached and local memory accesses are much faster than remote read accesses to data generated by another processor or remote write to data intentionally pushed to memories close to another processor. The bandwidth from/to cache and local memory can be an order of magnitude (or more) higher than the bandwidth to/from remote memory. The situation is further complicated by the heavy influence of the access pattern (i.e. the spatial locality of reference) on both the local and the remote memory system bandwidth. In these modern machines, a compiler for a parallel system is faced with a number of options to accomplish a data transfer most efficiently. The decision for the best option requires a cost benefit model, obtained in an empirical evaluation of the memory system performance. We evaluate three DEC Alpha based parallel systems, to demonstrate the practicality of this approach. The common DEC-Alpha processor architecture facilitates a direct comparison of memory system performance. These systems are the DEC 8400, the Cray T3D, and the Cray T3E. The three systems differ in their clock speed, their scalability and in the amount of coherency they provide
Keywords :
cache storage; memory architecture; parallel processing; performance evaluation; program compilers; Cray T3D; Cray T3E; DEC 8400; DEC Alpha based parallel systems; DEC-Alpha processor architecture; access pattern; cache storage; clock speed; coherency; compiler; cost benefit model; empirical evaluation; global address space; local memory; local memory accesses; memory system performance characterization; nonuniform bandwidth; parallel systems; remote write; scalability; spatial locality; Bandwidth; Clocks; Computer science; Computerized monitoring; Concurrent computing; Contracts; Modems; Optimizing compilers; Read-write memory; System performance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Computer Architecture, 1997., Third International Symposium on
Conference_Location :
San Antonio, TX
Print_ISBN :
0-8186-7764-3
Type :
conf
DOI :
10.1109/HPCA.1997.569658
Filename :
569658
Link To Document :
بازگشت