DocumentCode
1796997
Title
Enhanced power-normalized features for mandarin robust speech recognition based on a voiced-unvoiced-silence decision
Author
Ying-Wei Tan ; Wen-Ju Liu ; Zhan-Lei Yang ; Ming-ming Chen
Author_Institution
Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
fYear
2014
fDate
9-13 July 2014
Firstpage
222
Lastpage
226
Abstract
Power-normalized features have been shown to improve the performance of English large vocabulary continuous speech recognition under different acoustic conditions. In this paper, considering tone characteristics of Mandarin speech, we adopt different strategies to deal with different sounds based on a voiced-unvoiced-silence decision of sounds. For voiced sounds, harmonic enhancement based on a weighted harmonic-noise-model (WHNM) is applied to accurately capture the salient harmonic information and decreases the effect of various non-stationary noises. After this, standard power-normalized processing (SPNP) is performed. For unvoiced sounds, the SPNP is only used. For silence sounds, an quality frame dropping (FD) algorithm is incorporated into the front-end properly. As a result, enhanced power-normalized features are obtained and used to process noise-corrupted Mandarin speech. The experimental results show better recognition accuracies for Mandarin continuous speech recognition in noisy environments over the ETSI Advanced Front-End (AFE).
Keywords
decision theory; speech processing; speech recognition; AFE; ETSI advanced front-end; English large vocabulary continuous speech recognition; FD algorithm; Mandarin continuous speech recognition; Mandarin robust speech recognition; SPNP; WHNM; acoustic conditions; enhanced power-normalized features; harmonic enhancement; noise-corrupted Mandarin speech processing; nonstationary noises; quality frame dropping algorithm; salient harmonic information; standard power-normalized processing; voiced sounds; voiced-unvoiced-silence decision; weighted harmonic-noise-model; Accuracy; Noise; Noise measurement; Robustness; Speech; Speech recognition; Telecommunication standards; Mandarin robust speech recognition; a voiced-unvoiced-silence decision; a weighted harmonic-noise-model; enhanced power-normalized features;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on
Conference_Location
Xi´an
Print_ISBN
978-1-4799-5401-8
Type
conf
DOI
10.1109/ChinaSIP.2014.6889236
Filename
6889236
Link To Document