DocumentCode :
3338255
Title :
High performance median filtering using commodity graphics hardware
Author :
Chen, Wei ; Beister, Marcel ; Kyriakou, Yiannis ; Kachelries, M.
Author_Institution :
Inst. of Med. Phys. (IMP), Univ. of Erlangen- Nurnberg, Erlangen, Germany
fYear :
2009
fDate :
Oct. 24 2009-Nov. 1 2009
Firstpage :
4142
Lastpage :
4147
Abstract :
Median filtering is a commonly used technique in smoothing and denoising applications. Based on the vector programming model of modern commodity graphics processing units (GPUs), which directly support for minmax operations, compare and select as fundamental instructions, we implemented the branchless vectorized median (BVM) filter proposed in reference [1] using NVIDIA´s compute unified device architecture (CUDA). The BVM filter keeps track of a sorted array from which values are deleted and to which new values are inserted. Although it is of O(M2) computational complexity while other sort algorithms are of O(M ln M) computational complexity, at least for typical data, it may outperform other implementations. The mainly reason is that this algorithm is branchless, and it makes use of data-level parallelism thereby its runtime is data- independent and highly predictable. We describe some important criteria such as the memory layout for a fast accessing scheme and discuss the bottlenecks in the branchless vectorized median computation. We provide performance benchmarks in comparsion to other implementations, a median filter on GPUs based on comparing a pivot value to all values, and the same branchless vectorized median implementation on CPUs. The comparison uses constant data, linear data, and random data. The runtime of BVM is independent of the data and shows a factor up to 4.6 faster than the pivot median filter. Although the performance of CUDA-based BVM filter is roughly 25% slower compared to a CPU-based (8 cores) routine, it is still a cheaper solution for many applications. We also present some factors such as the array size, the number of arrays and the filter size that influence the total BVM performance. An application of median filter for ring artifacts reduction will be demonstrated. The processing time is up to 3.7 times faster than the optimized CPU-based (four cores) routine.
Keywords :
biology computing; computerised tomography; medical image processing; CUDA-based BVM filter; NVIDIA compute unified device architecture; O(M In M) computational complexity; O(M2) computational complexity; branchless vectorized median filter; computerised tomography; data-level parallelism; fast accessing scheme; high performance median filtering; memory layout; modern commodity graphics processing units; pivot median filter; vectorized median computation; Computational complexity; Computer aided instruction; Filtering; Filters; Graphics; Hardware; Minimax techniques; Noise reduction; Runtime; Smoothing methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE
Conference_Location :
Orlando, FL
ISSN :
1095-7863
Print_ISBN :
978-1-4244-3961-4
Electronic_ISBN :
1095-7863
Type :
conf
DOI :
10.1109/NSSMIC.2009.5402323
Filename :
5402323
Link To Document :
بازگشت