Title :
Automatic temporal alignment of AV data with confidence estimation
Author :
Korchagin, Danil ; Garner, Philip N. ; Dines, John
Author_Institution :
Idiap Res. Inst., Martigny, Switzerland
Abstract :
In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements.
Keywords :
audio signal processing; audio-visual systems; speech recognition; time-frequency analysis; video cameras; ASR-related features; AV data; audio-visual data; automatic audio-based temporal alignment; camcorders; confidence estimation; mobile phones; perceptual time-frequency analysis; reference camera; Automatic speech recognition; Clocks; High definition video; Layout; Mel frequency cepstral coefficient; Phase change materials; Smart cameras; Synchronization; Testing; Time frequency analysis; pattern matching; reliability estimation; time synchronisation; time-frequency analysis;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495953