DocumentCode :
3459733
Title :
Studying Asynchronous Shared Memory Computations
Author :
Juvaste, Simo
Author_Institution :
Univ. of Joensuu, Joensuu
fYear :
2007
fDate :
15-19 Sept. 2007
Firstpage :
413
Lastpage :
413
Abstract :
We present an experimental framework, F-PRAM model, for modeling, studying, and teaching the impacts of different properties of parallel computers. The performance model is a parameterized model similar to BSP [4] and LogP [1] with some additional parameters for more refined analysis when needed. To use the model, we present an emulation and experimentation system to study the impact of the parameters and communication network design decisions on application algorithm performance. Even if the parameterized models suite well for algorithm development and analysis, they cannot, however, represent accurately a complex real parallel computer, for example, hierarchical structure of computing clusters. Consequently, we use simulated network as a basis for comparison. With the simulated network approach, we can simulate any network topology, link/node speed, and routing algorithm. Using the simulated system, we can execute either benchmarks to measure the values of F-PRAM parameters (such as latency and bandwidth), or application algorithms to analyze their performance. The emulator takes as input the application algorithm (high level language), input data, machine parameters (such as number of processors and memory modules), other details (such as memory module latency and bandwidth), interconnection network topology and properties, routing algorithm (in C), length of buffers, and memory allocation (hash) scheme. During the execution, the program can do any output, but usually we are interested in the number of clock cycles needed for the execution. To use system efficiently, we have an automated measurement system that takes sets of configuration parameters and executes the simulation for every combination of the parameters, records the results, and visualizes the results with one or more graphs. For example, we select an application algorithm, input size, a number of processors, a set of different shapes of 3D mesh network, and mesh usage sparseness to see which is opti- - mal for our algorithm. Or, we can select a set of hash algorithms and variations of application algorithm to see if some access patterns perform better than others.Our programming model is simplified Modula-2 with additional par-do to divide the execution threads for parallel execution. Shared memory data (variables) must be asynchronously pre-fetched to local variables before usage. Similarly, shared memory is updated with asynchronous writes. Programmer has the responsibility of memory consistency. The programmer can use the values of model parameters in program to make program to adapt to machine properties. In case of simulated network, the routing algorithm can collect statistics of the delays, and adjust the parameters accordingly on runtime. Our current set of example algorithms includes generic benchmarks (for latency and bandwidth), maximum finding, odd-even merge sort, matrix multiplication, matrix inversion, and image smoothing. Detailed descriptions of algorithms can be found in [2]. Our current set of network topologies include butterfly (deflection routing), hypercube, and 3D mesh/torus. We are planning to implement hierarchical memory structures, e.g., mesh of SMP nodes. Current version of the system is available at [3].
Keywords :
parallel processing; shared memory systems; F-PRAM model; application algorithm performance; asynchronous shared memory computation; communication network design; image smoothing; link/node speed; matrix inversion; matrix multiplication; maximum finding; network topology; odd-even merge sort; parallel computers; routing algorithm; simulated network approach; Algorithm design and analysis; Bandwidth; Clustering algorithms; Computational modeling; Concurrent computing; Delay; Network topology; Performance analysis; Programming profession; Routing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architecture and Compilation Techniques, 2007. PACT 2007. 16th International Conference on
Conference_Location :
Brasov
ISSN :
1089-795X
Print_ISBN :
978-0-7695-2944-8
Type :
conf
DOI :
10.1109/PACT.2007.4336241
Filename :
4336241
Link To Document :
بازگشت