Title :
Softsad: Integrated frame-based speech confidence for speaker recognition
Author :
McLaren, Mitchell ; Graciarena, Martin ; Yun Lei
Author_Institution :
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
Abstract :
In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system as an alternative to using speech activity detection (SAD). Motivated by the need to use audio from short recordings more efficiently, softSAD removes the need to discard audio using speech/non-speech decisions based on a threshold as done with SAD. Instead, softSAD explicitly integrates into the Baum-Welch statistics a speech posterior for each frame. We compare softSAD and SAD in mismatched conditions by evaluating a system developed for the National Institute for Standards and Technology (NIST) 2012 speaker recognition evaluation (SRE) on the short test conditions of the channel-degraded Robust Automatic Transcription of Speech (RATS) speaker identification task (and vice versa). We demonstrate that softSAD provides benefit over SAD for short test audio in mismatched conditions.
Keywords :
speaker recognition; speech processing; statistical analysis; Baum-Welch statistics; National Institute for Standards and Technology 2012 speaker recognition evaluation; channel-degraded RATS speaker identification task; channel-degraded robust automatic transcription-of-speech speaker identification task; direct speech posterior integration; integrated frame-based speech confidence; softSAD; speaker recognition system; speech activity detection; Hidden Markov models; NIST; Rats; Speaker recognition; Speech; Speech processing; Speech recognition; Speech activity detection; mismatched conditions; speaker identification; unseen conditions;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178861