• DocumentCode
    1806337
  • Title

    Audio-visual face detection for tracking in a meeting room environment

  • Author

    Barnard, Mark ; Wenwu Wang ; Kittler, Josef ; Naqvi, Syed Mohsen ; Chambers, Jonathon

  • Author_Institution
    Centre for Visions, Speech & Signal Process. (CVSSP), Univ. of Surrey, Guildford, UK
  • fYear
    2013
  • fDate
    9-12 July 2013
  • Firstpage
    1222
  • Lastpage
    1227
  • Abstract
    A key task in many applications such as tracking or face recognition is the detection and localisation of a subject´s face in an image. This can still prove to be a challenging task particularly in low resolution or noisy images. Here we propose a robust method for face detection using both audio and visual information. We construct a dictionary learning based face detector using a set of distinctive and robust image features. We then train a support vector machine classifier using sparse image representations produced by this dictionary to classify face versus background. This is combined with the azimuth angle of the speaker produced by an audio localisation system to constrain the search space for the subject´s face. This increases the efficiency of the detection and localisation process by limiting the search area. However, more importantly, the audio information allows us to know a priori the number of subjects in the image. This greatly reduces the possibility of false positive face detections. We demonstrate the advantage of this proposed approach over traditional face detection methods on the challenging AV16.3 dataset.
  • Keywords
    face recognition; feature extraction; image classification; learning (artificial intelligence); object detection; object tracking; speaker recognition; support vector machines; audio information; audio localisation system; audio-visual face detection; detection process; dictionary learning based face detector; face recognition; false positive face detections; image features; localisation process; low resolution image; noisy image; object tracking; sparse image representations; speaker azimuth angle; support vector machine classifier; visual information; Dictionaries; Face; Face detection; Feature extraction; Histograms; Vectors; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Fusion (FUSION), 2013 16th International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    978-605-86311-1-3
  • Type

    conf

  • Filename
    6641136