Audio-visual face detection for tracking in a meeting room environment

Author

Barnard, Mark ; Wenwu Wang ; Kittler, Josef ; Naqvi, Syed Mohsen ; Chambers, Jonathon

Author_Institution

Centre for Visions, Speech & Signal Process. (CVSSP), Univ. of Surrey, Guildford, UK

fYear

2013

fDate

9-12 July 2013

Firstpage

1222

Lastpage

1227

Abstract

A key task in many applications such as tracking or face recognition is the detection and localisation of a subject´s face in an image. This can still prove to be a challenging task particularly in low resolution or noisy images. Here we propose a robust method for face detection using both audio and visual information. We construct a dictionary learning based face detector using a set of distinctive and robust image features. We then train a support vector machine classifier using sparse image representations produced by this dictionary to classify face versus background. This is combined with the azimuth angle of the speaker produced by an audio localisation system to constrain the search space for the subject´s face. This increases the efficiency of the detection and localisation process by limiting the search area. However, more importantly, the audio information allows us to know a priori the number of subjects in the image. This greatly reduces the possibility of false positive face detections. We demonstrate the advantage of this proposed approach over traditional face detection methods on the challenging AV16.3 dataset.

Keywords

face recognition; feature extraction; image classification; learning (artificial intelligence); object detection; object tracking; speaker recognition; support vector machines; audio information; audio localisation system; audio-visual face detection; detection process; dictionary learning based face detector; face recognition; false positive face detections; image features; localisation process; low resolution image; noisy image; object tracking; sparse image representations; speaker azimuth angle; support vector machine classifier; visual information; Dictionaries; Face; Face detection; Feature extraction; Histograms; Vectors; Visualization;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Fusion (FUSION), 2013 16th International Conference on

Conference_Location

Istanbul

Print_ISBN

978-605-86311-1-3

Type

conf

Filename

6641136