Title :
Accelerating Boosting-Based Face Detection on GPUs
Author :
Oro, David ; Fern´ndez, C. ; Segura, Carlos ; Martorell, Xavier ; Hernando, Javier
Author_Institution :
Herta Security, Barcelona, Spain
Abstract :
The goal of face detection is to determine the presence of faces in arbitrary images, along with their locations and dimensions. As it happens with any graphics workloads, these algorithms benefit from data-level parallelism. Existing parallelization efforts strictly focus on mapping different divide and conquer strategies into multicore CPUs and GPUs. However, even the most advanced single-chip many-core processors to date are still struggling to effectively handle real-time face detection under high-definition video workloads. To address this challenge, face detection algorithms typically avoid computations by dynamically evaluating a boosted cascade of classifiers. Unfortunately, this technique yields a low ALU occupancy in architectures such as GPUs, which heavily rely on large SIMD widths for maximizing data-level parallelism. In this paper we present several techniques to increase the performance of the cascade evaluation kernel, which is the most resource-intensive part of the face detection pipeline. Particularly, the usage of concurrent kernel execution in combination with cascades generated with the Gentle Boost algorithm solves the problem of GPU underutilization, and achieves a 5X speedup in 1080p videos on average over the fastest known implementations, while slightly improving the accuracy. Finally, we also studied the parallelization of the cascade training process and its scalability under SMP platforms. The proposed parallelization strategy exploits both task and data-level parallelism and achieves a 3.5X speedup over single-threaded implementations.
Keywords :
face recognition; graphics processing units; high definition video; multiprocessing systems; parallel programming; training; video signal processing; 5X speedup; ALU; GPU; GPU underutilization; GentleBoost algorithm; SIMD; SMP platforms; accelerating boosting-based face detection; advanced single-chip many-core processors; boosted classifiers cascade; cascade evaluation kernel; cascade training process; concurrent kernel execution; conquer strategies; data-level parallelism; face detection pipeline; graphics workloads; high-definition video workloads; multicore CPU; parallelization strategy; real-time face detection; resource-intensive part; single-threaded implementations; Face; Face detection; Graphics processing unit; Instruction sets; Kernel; Parallel processing; Training; Face detection; GPU; parallel programming; video processing;
Conference_Titel :
Parallel Processing (ICPP), 2012 41st International Conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
978-1-4673-2508-0
DOI :
10.1109/ICPP.2012.12