DocumentCode :
3280359
Title :
Video tracking through occlusions by fast audio source localisation
Author :
D´Arca, Eleonora ; Hughes, Ashley ; Robertson, Neil M. ; Hopgood, James
Author_Institution :
Joint Res. Inst. for Signal & Image Process., Heriot-Watt Univ. & Univ. of Edinburgh, Edinburgh, UK
fYear :
2013
fDate :
15-18 Sept. 2013
Firstpage :
2660
Lastpage :
2664
Abstract :
In this paper we present a novel audio-visual speaker detection and localisation algorithm. Audio source position estimates are computed by a novel stochastic region contraction (SRC) audio search algorithm for accurate speaker localisation. This audio search algorithm is aided by available video information (stochastic region contraction with height estimation (SRC-HE)) which estimates head heights over the whole scene and gives a speed improvement of 56% over SRC. We finally combine audio and video data in a Kalman filter (KF) which fuses person-position likelihoods and tracks the speaker. Our system is composed of a single video camera and 16 microphones. We validate the approach on the problem of video occlusion i.e. two people having a conversation have to be detected and localised at a distance (as in surveillance scenarios vs. enclosed meeting rooms). We show video occlusion can be resolved and speakers can be correctly detected/localised in real data. Moreover, SRC-HE based joint audio-video (AV) speaker tracking outperforms the one based on the original SRC by 16% and 4% in terms of multi object tracking precision (MOTP) and multi object tracking accuracy (MOTA). Speaker change detection improves by 11% over SRC.
Keywords :
Kalman filters; audio-visual systems; microphone arrays; natural scenes; object tracking; search problems; speaker recognition; stochastic processes; video surveillance; KF; Kalman filter; MOTA; MOTP; SRC audio search algorithm; SRC-HE based joint audio-video speaker tracking; audio data; audio source position estimation; audio-source localisation; audio-visual speaker detection algorithm; audio-visual speaker localisation algorithm; enclosed meeting rooms; head height estimation; height estimation; microphones; multiobject tracking accuracy; multiobject tracking precision; person-position likelihood fusion; speaker change detection improvement; speaker tracking; stochastic region contraction audio search algorithm; surveillance scenarios; video camera; video data; video information; video occlusion problem; video tracking; Multimodal tracking; Optimization methods; Sampling Methods; Speaker Tracking; Video Tracking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Processing (ICIP), 2013 20th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Type :
conf
DOI :
10.1109/ICIP.2013.6738548
Filename :
6738548
Link To Document :
بازگشت