• DocumentCode
    1796997
  • Title

    Enhanced power-normalized features for mandarin robust speech recognition based on a voiced-unvoiced-silence decision

  • Author

    Ying-Wei Tan ; Wen-Ju Liu ; Zhan-Lei Yang ; Ming-ming Chen

  • Author_Institution
    Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
  • fYear
    2014
  • fDate
    9-13 July 2014
  • Firstpage
    222
  • Lastpage
    226
  • Abstract
    Power-normalized features have been shown to improve the performance of English large vocabulary continuous speech recognition under different acoustic conditions. In this paper, considering tone characteristics of Mandarin speech, we adopt different strategies to deal with different sounds based on a voiced-unvoiced-silence decision of sounds. For voiced sounds, harmonic enhancement based on a weighted harmonic-noise-model (WHNM) is applied to accurately capture the salient harmonic information and decreases the effect of various non-stationary noises. After this, standard power-normalized processing (SPNP) is performed. For unvoiced sounds, the SPNP is only used. For silence sounds, an quality frame dropping (FD) algorithm is incorporated into the front-end properly. As a result, enhanced power-normalized features are obtained and used to process noise-corrupted Mandarin speech. The experimental results show better recognition accuracies for Mandarin continuous speech recognition in noisy environments over the ETSI Advanced Front-End (AFE).
  • Keywords
    decision theory; speech processing; speech recognition; AFE; ETSI advanced front-end; English large vocabulary continuous speech recognition; FD algorithm; Mandarin continuous speech recognition; Mandarin robust speech recognition; SPNP; WHNM; acoustic conditions; enhanced power-normalized features; harmonic enhancement; noise-corrupted Mandarin speech processing; nonstationary noises; quality frame dropping algorithm; salient harmonic information; standard power-normalized processing; voiced sounds; voiced-unvoiced-silence decision; weighted harmonic-noise-model; Accuracy; Noise; Noise measurement; Robustness; Speech; Speech recognition; Telecommunication standards; Mandarin robust speech recognition; a voiced-unvoiced-silence decision; a weighted harmonic-noise-model; enhanced power-normalized features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on
  • Conference_Location
    Xi´an
  • Print_ISBN
    978-1-4799-5401-8
  • Type

    conf

  • DOI
    10.1109/ChinaSIP.2014.6889236
  • Filename
    6889236