• DocumentCode
    3338255
  • Title

    High performance median filtering using commodity graphics hardware

  • Author

    Chen, Wei ; Beister, Marcel ; Kyriakou, Yiannis ; Kachelries, M.

  • Author_Institution
    Inst. of Med. Phys. (IMP), Univ. of Erlangen- Nurnberg, Erlangen, Germany
  • fYear
    2009
  • fDate
    Oct. 24 2009-Nov. 1 2009
  • Firstpage
    4142
  • Lastpage
    4147
  • Abstract
    Median filtering is a commonly used technique in smoothing and denoising applications. Based on the vector programming model of modern commodity graphics processing units (GPUs), which directly support for minmax operations, compare and select as fundamental instructions, we implemented the branchless vectorized median (BVM) filter proposed in reference [1] using NVIDIA´s compute unified device architecture (CUDA). The BVM filter keeps track of a sorted array from which values are deleted and to which new values are inserted. Although it is of O(M2) computational complexity while other sort algorithms are of O(M ln M) computational complexity, at least for typical data, it may outperform other implementations. The mainly reason is that this algorithm is branchless, and it makes use of data-level parallelism thereby its runtime is data- independent and highly predictable. We describe some important criteria such as the memory layout for a fast accessing scheme and discuss the bottlenecks in the branchless vectorized median computation. We provide performance benchmarks in comparsion to other implementations, a median filter on GPUs based on comparing a pivot value to all values, and the same branchless vectorized median implementation on CPUs. The comparison uses constant data, linear data, and random data. The runtime of BVM is independent of the data and shows a factor up to 4.6 faster than the pivot median filter. Although the performance of CUDA-based BVM filter is roughly 25% slower compared to a CPU-based (8 cores) routine, it is still a cheaper solution for many applications. We also present some factors such as the array size, the number of arrays and the filter size that influence the total BVM performance. An application of median filter for ring artifacts reduction will be demonstrated. The processing time is up to 3.7 times faster than the optimized CPU-based (four cores) routine.
  • Keywords
    biology computing; computerised tomography; medical image processing; CUDA-based BVM filter; NVIDIA compute unified device architecture; O(M In M) computational complexity; O(M2) computational complexity; branchless vectorized median filter; computerised tomography; data-level parallelism; fast accessing scheme; high performance median filtering; memory layout; modern commodity graphics processing units; pivot median filter; vectorized median computation; Computational complexity; Computer aided instruction; Filtering; Filters; Graphics; Hardware; Minimax techniques; Noise reduction; Runtime; Smoothing methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE
  • Conference_Location
    Orlando, FL
  • ISSN
    1095-7863
  • Print_ISBN
    978-1-4244-3961-4
  • Electronic_ISBN
    1095-7863
  • Type

    conf

  • DOI
    10.1109/NSSMIC.2009.5402323
  • Filename
    5402323