DocumentCode
3338255
Title
High performance median filtering using commodity graphics hardware
Author
Chen, Wei ; Beister, Marcel ; Kyriakou, Yiannis ; Kachelries, M.
Author_Institution
Inst. of Med. Phys. (IMP), Univ. of Erlangen- Nurnberg, Erlangen, Germany
fYear
2009
fDate
Oct. 24 2009-Nov. 1 2009
Firstpage
4142
Lastpage
4147
Abstract
Median filtering is a commonly used technique in smoothing and denoising applications. Based on the vector programming model of modern commodity graphics processing units (GPUs), which directly support for minmax operations, compare and select as fundamental instructions, we implemented the branchless vectorized median (BVM) filter proposed in reference [1] using NVIDIA´s compute unified device architecture (CUDA). The BVM filter keeps track of a sorted array from which values are deleted and to which new values are inserted. Although it is of O(M2) computational complexity while other sort algorithms are of O(M ln M) computational complexity, at least for typical data, it may outperform other implementations. The mainly reason is that this algorithm is branchless, and it makes use of data-level parallelism thereby its runtime is data- independent and highly predictable. We describe some important criteria such as the memory layout for a fast accessing scheme and discuss the bottlenecks in the branchless vectorized median computation. We provide performance benchmarks in comparsion to other implementations, a median filter on GPUs based on comparing a pivot value to all values, and the same branchless vectorized median implementation on CPUs. The comparison uses constant data, linear data, and random data. The runtime of BVM is independent of the data and shows a factor up to 4.6 faster than the pivot median filter. Although the performance of CUDA-based BVM filter is roughly 25% slower compared to a CPU-based (8 cores) routine, it is still a cheaper solution for many applications. We also present some factors such as the array size, the number of arrays and the filter size that influence the total BVM performance. An application of median filter for ring artifacts reduction will be demonstrated. The processing time is up to 3.7 times faster than the optimized CPU-based (four cores) routine.
Keywords
biology computing; computerised tomography; medical image processing; CUDA-based BVM filter; NVIDIA compute unified device architecture; O(M In M) computational complexity; O(M2) computational complexity; branchless vectorized median filter; computerised tomography; data-level parallelism; fast accessing scheme; high performance median filtering; memory layout; modern commodity graphics processing units; pivot median filter; vectorized median computation; Computational complexity; Computer aided instruction; Filtering; Filters; Graphics; Hardware; Minimax techniques; Noise reduction; Runtime; Smoothing methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE
Conference_Location
Orlando, FL
ISSN
1095-7863
Print_ISBN
978-1-4244-3961-4
Electronic_ISBN
1095-7863
Type
conf
DOI
10.1109/NSSMIC.2009.5402323
Filename
5402323
Link To Document