DocumentCode :
154121
Title :
A GPU-Based Algorithm-Specific Optimization for High-Performance Background Subtraction
Author :
Chulian Zhang ; Tabkhi, Hamed ; Schirner, Gunar
Author_Institution :
Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
fYear :
2014
fDate :
9-12 Sept. 2014
Firstpage :
182
Lastpage :
191
Abstract :
Background subtraction is an essential first stage in many vision applications differentiating foreground pixels from the background scene, with Mixture of Gaussians (MoG) being a widely used implementation choice. MoG´s high computation demand renders a real-time single threaded realization infeasible. With it´s pixel level parallelism, deploying MoG on top of parallel architectures such as a Graphics Processing Unit (GPU) is promising. However, MoG poses many challenges having a significant control flow (potentially reducing GPU efficiency) as well as a significant memory bandwidth demand. In this paper, we propose a GPU implementation of Mixture of Gaussians (MoG) that surpasses real-time processing for full HD (1080p 60 Hz). This paper describes step-wise optimizations starting from general GPU optimizations (such as memory coalescing, computation & communication overlapping), via algorithm-specific optimizations including control flow reduction and register usage optimization, to windowed optimization utilizing shared memory. For each optimization, this paper evaluates the performance potential and identifies architectural bottlenecks. Our CUDA-based implementation improves performance over sequential implementation by 57×, 97× and 101× through general, algorithm-specific, and windowed optimizations respectively, without impact to the output quality.
Keywords :
Gaussian processes; computer vision; graphics processing units; parallel architectures; real-time systems; shared memory systems; CUDA-based implementation; GPU efficiency; GPU implementation; GPU optimization; GPU-based algorithm-specific optimization; MoG; architectural bottleneck; control flow reduction; foreground pixel; graphics processing unit; high-performance background subtraction; memory bandwidth demand; mixture of Gaussians; parallel architecture; performance potential; pixel level parallelism; real-time processing; real-time single threaded realization; register usage optimization; sequential implementation; shared memory; stepwise optimization; vision application; windowed optimization; Algorithm design and analysis; Computer architecture; Data transfer; Graphics processing units; Instruction sets; Kernel; Optimization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2014 43rd International Conference on
Conference_Location :
Minneapolis MN
ISSN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2014.27
Filename :
6957227
Link To Document :
بازگشت