Enhanced power-normalized features for mandarin robust speech recognition based on a voiced-unvoiced-silence decision

Author

Ying-Wei Tan ; Wen-Ju Liu ; Zhan-Lei Yang ; Ming-ming Chen

Author_Institution

Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China

fYear

2014

fDate

9-13 July 2014

Firstpage

222

Lastpage

226

Abstract

Power-normalized features have been shown to improve the performance of English large vocabulary continuous speech recognition under different acoustic conditions. In this paper, considering tone characteristics of Mandarin speech, we adopt different strategies to deal with different sounds based on a voiced-unvoiced-silence decision of sounds. For voiced sounds, harmonic enhancement based on a weighted harmonic-noise-model (WHNM) is applied to accurately capture the salient harmonic information and decreases the effect of various non-stationary noises. After this, standard power-normalized processing (SPNP) is performed. For unvoiced sounds, the SPNP is only used. For silence sounds, an quality frame dropping (FD) algorithm is incorporated into the front-end properly. As a result, enhanced power-normalized features are obtained and used to process noise-corrupted Mandarin speech. The experimental results show better recognition accuracies for Mandarin continuous speech recognition in noisy environments over the ETSI Advanced Front-End (AFE).

Keywords

decision theory; speech processing; speech recognition; AFE; ETSI advanced front-end; English large vocabulary continuous speech recognition; FD algorithm; Mandarin continuous speech recognition; Mandarin robust speech recognition; SPNP; WHNM; acoustic conditions; enhanced power-normalized features; harmonic enhancement; noise-corrupted Mandarin speech processing; nonstationary noises; quality frame dropping algorithm; salient harmonic information; standard power-normalized processing; voiced sounds; voiced-unvoiced-silence decision; weighted harmonic-noise-model; Accuracy; Noise; Noise measurement; Robustness; Speech; Speech recognition; Telecommunication standards; Mandarin robust speech recognition; a voiced-unvoiced-silence decision; a weighted harmonic-noise-model; enhanced power-normalized features;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on

Conference_Location

Xi´an

Print_ISBN

978-1-4799-5401-8

Type

conf

DOI

10.1109/ChinaSIP.2014.6889236

Filename

6889236