• DocumentCode
    1668454
  • Title

    Using The Voice Spectrum For Improved Tracking Of People In A Joint Audio-Video Scheme

  • Author

    D´Arca, Eleonora ; Robertson, Neil M. ; Hopgood, James

  • Author_Institution
    Joint Res. Inst. for Signal & Image Process., Heriot-Watt Univ., Edinburgh, UK
  • fYear
    2013
  • Firstpage
    3622
  • Lastpage
    3626
  • Abstract
    In this paper we present a new solution to the problem of speaker tracking among people where occlusions occur (disappearance and non-speaking). In a normal conversation between two or more people, we learn speaker mel-cepstral coefficients (MFCC) and incorporate this information into a sequential Bayesian audio-video position tracker. The joint video-to-audio data association step is thus improved and we achieve robust person recognition which in turn aids tracking performance. We provide comprehensive evaluation via simulations and real data quoting tracking accuracy, precision and diarisation error rate (DER) compared to ground truth. For simulate and real experiments in an open space the trajectory tracking performance increases by 20% measured against ground truth using our approach. As a further enhancement versus the state-of-the-art, speaker identity recognition at a distance is improved by 20% by exploiting audio-video localisation cues.
  • Keywords
    audio signal processing; speaker recognition; video signal processing; aids tracking performance; audio-video localisation cues; data quoting tracking accuracy; diarisation error rate; joint audio-video scheme; mel-cepstral coefficients; sequential Bayesian audio-video position tracker; speaker identity recognition; speaker tracking; trajectory tracking performance; video-to-audio data association step; voice spectrum; Accuracy; Cameras; Density estimation robust algorithm; Speaker recognition; Speech; Target tracking; Trajectory; Distant Speaker Recognition; EKF; MFCC; Multimodal tracking; Speaker Tracking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6638333
  • Filename
    6638333