Title :
Automatic recognition of wideband telephone speech with limited amount of matched training data
Author :
Bauer, Pavol ; Abel, Johannes ; Fischer, V. ; Fingscheidt, Tim
Author_Institution :
Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany
Abstract :
Automatic speech recognition (ASR) for wideband (WB) telephone speech services must cope with a lack of matching speech databases for acoustic model training. This paper investigates the impact of mixing insufficient WB and additional narrowband (NB) speech training data. It turns out that decimation and interpolation techniques, reducing the bandwidth mismatch between the NB speech material in training and the WB speech data to be recognized, do not succeed in outperforming the pure NB ASR baseline. However, true WB ASR training supported by artificial bandwidth extension (ABE) reveals a performance gain. A new ABE approach that makes use of robust dynamic features and a Viterbi path decoder exploiting phonetic a priori knowledge proves to be superior. It yields a reduction of 1.9 % word error rate relative to the NB ASR baseline and 9.3 % relative to a WB ASR experiment trained on only a limited amount of WB speech data.
Keywords :
Viterbi decoding; interpolation; speech coding; speech recognition; telephony; ABE approach; NB speech material; Viterbi path decoder; WB speech data; acoustic model training; artificial bandwidth extension; automatic speech recognition; bandwidth mismatch reduction; decimation techniques; insufficient WB mixing; interpolation techniques; matched training data; narrowband speech training data; phonetic; robust dynamic features; speech database matching; true WB ASR training; wideband telephone speech services; word error rate; Acoustics; Hidden Markov models; Niobium; Speech; Speech recognition; Training; Training data; bandwidth extension; speech recognition;
Conference_Titel :
Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
Conference_Location :
Lisbon