Title :
Programming FFT on DSM multiprocessors
Author :
Shan, Hongzhang ; Feng, Jianhua ; Shan, Hongzhong
Author_Institution :
Dept. of Comput. Sci., Princeton Univ., NJ, USA
Abstract :
The performance of the shared address space programming model for the kinds of coarse-grained communicating programs which have traditionally been common in scientific computing, is not clear today. We use the challenging 1-dimensional FFT, a regular coarse-grained program, as our driving application to study how to get high performance for such kind of applications under the shared address space programming model on a hardware supported cache-coherent distributed memory machine. We find that its performance is highly affected by the data placement. Proper data placement will be critical to the success of this kind of application. Prefetching could further improve the performance to a degree of 10 percent to 50 percent for the data sets we studied. Naive programming will easily cause the performance bottleneck by introducing much more contention and lead to great performance loss. If the shared address space programs are properly programmed, it will deliver much better performance than the other popular programming models, such as MPI and SHMEM.
Keywords :
cache storage; distributed shared memory systems; fast Fourier transforms; mathematics computing; parallel programming; software performance evaluation; FFT; MPI; SHMEM; cache-coherent distributed memory machine; coarse-grained communicating programs; data placement; data sets; distributed shared memory multiprocessors; fast Fourier transform; naive programming; performance bottleneck; prefetching; scientific computing; shared address space programming model;
Conference_Titel :
High Performance Computing in the Asia-Pacific Region, 2000. Proceedings. The Fourth International Conference/Exhibition on
Conference_Location :
Beijing, China
Print_ISBN :
0-7695-0589-2
DOI :
10.1109/HPC.2000.843504