• DocumentCode
    3166250
  • Title

    Model-based noise reduction leveraging frequency-wise confidence metric for in-car speech recognition

  • Author

    Ichikawa, Osamu ; Rennie, Steven J. ; Fukuda, Takashi ; Nishimura, Masafumi

  • Author_Institution
    IBM Res. - Tokyo, Yamato, Japan
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    4921
  • Lastpage
    4924
  • Abstract
    Model-based approaches for noise reduction effectively improve the performance of automatic speech recognition in noisy environments. Most of them use the Minimum Mean Square Estimate (MMSE) criterion for de-noised speech estimates. In general, an observation has speech-dominant bands and noise-dominant bands in the Mel spectral domain. This paper introduces a method to add weight to speech-dominated bands when evaluating the posterior probability of each speech state, as these bands are generally more reliable. To leverage high-resolution information in the Mel domain, we use Local Peak Weight (LPW) as the confidence metric for the degree of speech dominance. This information is also used to regulate the amount of compensation that is applied to each frequency band during feature reconstruction under an integrated probabilistic model. The method produced relative word error rate improvements of up to 33.8% over the baseline MMSE method on an isolated word task with car noise.
  • Keywords
    least mean squares methods; signal denoising; speech recognition; LPW; MMSE criterion; Mel spectral domain; automatic speech recognition; baseline MMSE method; de-noised speech estimates; feature reconstruction; in-car speech recognition; integrated probabilistic model; local peak weight; minimum mean square estimate; model-based approach; model-based noise reduction leveraging frequency-wise confidence metric; noise- dominant bands; noisy environments; posterior probability; relative word error rate; speech-dominant bands; speech-dominated bands; Harmonic analysis; Noise; Noise measurement; Noise reduction; Speech; Speech recognition; Harmonic analysis; missing feature; model-based noise reduction; robust speech recognition; speech enhancement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289023
  • Filename
    6289023