DocumentCode :
1695250
Title :
Rich system combination for keyword spotting in noisy and acoustically heterogeneous audio streams
Author :
Akbacak, Murat ; Burget, Lukas ; Wen Wang ; van Hout, Julien
Author_Institution :
Microsoft, Sunnyvale, CA, USA
fYear :
2013
Firstpage :
8267
Lastpage :
8271
Abstract :
We address the problem of retrieving spoken information from noisy and heterogeneous audio archives using system combination with a rich and diverse set of noise-robust modules. Audio search applications so far have focused on constrained domains or genres and not-so-noisy and heterogeneous acoustic or channel conditions. In this paper, our focus is to improve the accuracy of a keyword spotting system in highly degraded and diverse channel conditions by employing multiple recognition systems in parallel with different robust frontends and modeling choices, as well as different representations during audio indexing and search (words vs. subword units). After aligning keyword hits from different systems, we employ system combination at the score level using a logistic-regression-based classifier. Side information such as the output of an acoustic condition identification module is used to guide system combination system that is trained on a held-out dataset. Lattice-based indexing and search is used in all keyword spotting systems. We present improvements in probability-miss at a fixed probability-false-alarm by employing our proposed rich system combination approach on DARPA Robust Automatic Transcription of Speech (RATS) Phase-I evaluation data that contains highly degraded channel recordings (signal-to-noise ratio levels as low as 0 dB) and different channel characteristics.
Keywords :
audio signal processing; indexing; information retrieval; regression analysis; speech recognition; DARPA robust automatic transcription of speech; RATS; acoustic condition identification module; acoustically heterogeneous audio stream; audio indexing; audio search applications; diverse channel; held out dataset; keyword spotting; lattice based indexing; lattice based search; logistic regression based classifier; multiple recognition system; noisy audio stream; rich system combination; side information; spoken information retreival; Abstracts; Acoustics; Lattices; Noise measurement; Radio access networks; Keyword spotting; acoustic noise; channel degradation; fusion; robust audio search; spoken term detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639277
Filename :
6639277
Link To Document :
بازگشت