DocumentCode :
178408
Title :
UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech
Author :
Ghaffarzadegan, Shabnam ; Boril, Hynek ; Hansen, John H. L.
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
2544
Lastpage :
2548
Abstract :
This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7 % absolute WER reduction over the baseline system trained on neutral speech, and a 1.3 % reduction over a baseline system with whisper-adapted acoustic models.
Keywords :
acoustic signal processing; cepstral analysis; channel bank filters; data reduction; speech recognition; text analysis; UT-vocal effort II corpus; WER reduction; acoustic variation; alternative pronunciations; automatic speech recognition; bandwidth distribution; baseline system; cepstral dimensionality reduction; constrained lexicon recognition; formant center frequency; front-end filter bank redistribution; lexicon expansion; neutral speech; neutral trained acoustic model; silence speech signal segment; spectral slope; unvoiced speech signal segment; whisper adapted acoustic model; whispered speech; Adaptation models; Mel frequency cepstral coefficient; Speech; Speech processing; Speech recognition; Whisper speech recognition; filter-bank optimization; speech analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854059
Filename :
6854059
Link To Document :
بازگشت