Title :
Distributed-shared memory computed tomography
Author :
de la Fuente, Francisco ; Torres, F. ; Rannou, Fernando R.
Author_Institution :
Dept. de Ing. Inf., Univ. de Santiago de Chile, Santiago, Chile
fDate :
Oct. 27 2012-Nov. 3 2012
Abstract :
Large-scale statistical reconstruction algorithms are known to be memory and processor intensive applications. For instance, the system matrix for a small animal scanner requires several gigabytes of memory storage and the algorithm usually needs many iterations to produce acceptable images. In this abstract we design distributed-shared memory (DSM) statistical reconstruction algorithms to exploit all available computational resources as a unified infrastructure and thereby improving the cost-efficiency of the investment and scalability of the system. We use and compare two distinct approaches. The first one uses the Unified Parallel C (UPC) compiler which transparently provides a global shared virtual address space across all computers. Data is physically stored in different computers, but threads can access any shared item as it if were in its local memory. The second approach combines OpenMP and Pthreads shared-memory libraries with the message-passing library MPI. In this case threads only have access to the node´s local memory and access to remote data is carried out explicitly through message-passing. Early UPC experiments showed that keeping all data shared heavily affects reconstruction performance. Therefore, we devised a distribution method where some data is kept shared and other is kept private, mimicking somehow the library-based approach. However, even with data privatization, the compiler solution cannot compete with the library solutions. We explore three workload distribution strategies: LOR-based, Nonzero-based and Cores-based. The best performance is obtained with OpenMP+MPI and the Core-based balance algorithm, which reaches a speedup of 36 with 112 cores. However, both OpenMP+MPI and Pthreads+MPI outperform UPC by large. The low system efficiency of 0.32 is mainly due to the slow internode communication network.
Keywords :
application program interfaces; computerised tomography; distributed shared memory systems; image reconstruction; iterative methods; message passing; program compilers; statistical analysis; DSM statistical reconstruction algorithms; LOR-based workload distribution strategy; MPI; OpenMP libraries; Pthreads shared-memory libraries; UPC compiler; animal scanner; computational resources; core-based workload distribution strategy; data privatization; distributed-shared memory computed tomography; distributed-shared memory statistical reconstruction algorithms; distribution method; global shared virtual address space; internode communication network; investment cost-efficiency; iterations; large-scale statistical reconstruction algorithms; local memory; memory intensive applications; memory storage gigabytes; message-passing library; node local memory; nonzero-based workload distribution strategy; processor intensive applications; remote data; system matrix; unified infrastructure; unified parallel C compiler; Distributed-shared memory; Statistical Reconstruction;
Conference_Titel :
Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2012 IEEE
Conference_Location :
Anaheim, CA
Print_ISBN :
978-1-4673-2028-3
DOI :
10.1109/NSSMIC.2012.6551558