• DocumentCode
    154121
  • Title

    A GPU-Based Algorithm-Specific Optimization for High-Performance Background Subtraction

  • Author

    Chulian Zhang ; Tabkhi, Hamed ; Schirner, Gunar

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
  • fYear
    2014
  • fDate
    9-12 Sept. 2014
  • Firstpage
    182
  • Lastpage
    191
  • Abstract
    Background subtraction is an essential first stage in many vision applications differentiating foreground pixels from the background scene, with Mixture of Gaussians (MoG) being a widely used implementation choice. MoG´s high computation demand renders a real-time single threaded realization infeasible. With it´s pixel level parallelism, deploying MoG on top of parallel architectures such as a Graphics Processing Unit (GPU) is promising. However, MoG poses many challenges having a significant control flow (potentially reducing GPU efficiency) as well as a significant memory bandwidth demand. In this paper, we propose a GPU implementation of Mixture of Gaussians (MoG) that surpasses real-time processing for full HD (1080p 60 Hz). This paper describes step-wise optimizations starting from general GPU optimizations (such as memory coalescing, computation & communication overlapping), via algorithm-specific optimizations including control flow reduction and register usage optimization, to windowed optimization utilizing shared memory. For each optimization, this paper evaluates the performance potential and identifies architectural bottlenecks. Our CUDA-based implementation improves performance over sequential implementation by 57×, 97× and 101× through general, algorithm-specific, and windowed optimizations respectively, without impact to the output quality.
  • Keywords
    Gaussian processes; computer vision; graphics processing units; parallel architectures; real-time systems; shared memory systems; CUDA-based implementation; GPU efficiency; GPU implementation; GPU optimization; GPU-based algorithm-specific optimization; MoG; architectural bottleneck; control flow reduction; foreground pixel; graphics processing unit; high-performance background subtraction; memory bandwidth demand; mixture of Gaussians; parallel architecture; performance potential; pixel level parallelism; real-time processing; real-time single threaded realization; register usage optimization; sequential implementation; shared memory; stepwise optimization; vision application; windowed optimization; Algorithm design and analysis; Computer architecture; Data transfer; Graphics processing units; Instruction sets; Kernel; Optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2014 43rd International Conference on
  • Conference_Location
    Minneapolis MN
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2014.27
  • Filename
    6957227