• DocumentCode
    2806233
  • Title

    Automatic temporal alignment of AV data with confidence estimation

  • Author

    Korchagin, Danil ; Garner, Philip N. ; Dines, John

  • Author_Institution
    Idiap Res. Inst., Martigny, Switzerland
  • fYear
    2010
  • fDate
    14-19 March 2010
  • Firstpage
    269
  • Lastpage
    272
  • Abstract
    In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements.
  • Keywords
    audio signal processing; audio-visual systems; speech recognition; time-frequency analysis; video cameras; ASR-related features; AV data; audio-visual data; automatic audio-based temporal alignment; camcorders; confidence estimation; mobile phones; perceptual time-frequency analysis; reference camera; Automatic speech recognition; Clocks; High definition video; Layout; Mel frequency cepstral coefficient; Phase change materials; Smart cameras; Synchronization; Testing; Time frequency analysis; pattern matching; reliability estimation; time synchronisation; time-frequency analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-4295-9
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2010.5495953
  • Filename
    5495953