• DocumentCode
    35973
  • Title

    Robust Log-Energy Estimation and its Dynamic Change Enhancement for In-car Speech Recognition

  • Author

    Weifeng Li ; Longbiao Wang ; Yicong Zhou ; Bourlard, Herve ; Qingmin Liao

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Shenzhen, China
  • Volume
    21
  • Issue
    8
  • fYear
    2013
  • fDate
    Aug. 2013
  • Firstpage
    1689
  • Lastpage
    1698
  • Abstract
    The log-energy parameter, typically derived from a full-band spectrum, is a critical feature commonly used in automatic speech recognition (ASR) systems. However, log-energy is difficult to estimate reliably in the presence of background noise. In this paper, we theoretically show that background noise affects the trajectories of not only the “conventional” log-energy, but also its delta parameters. This results in a poor estimation of the actual log-energy and its delta parameters, which no longer describe the speech signal. We thus propose a new method to estimate log-energy from a sub-band spectrum, followed by dynamic change enhancement and mean smoothing. We demonstrate the effectiveness of the proposed log-energy estimation and its post-processing steps through speech recognition experiments conducted on the in-car CENSREC-2 database. The proposed log-energy (together with its corresponding delta parameters) yields an average improvement of 32.8% compared with the baseline front-ends. Moreover, it is also shown that further improvement can be achieved by incorporating the new Mel-Frequency Cepstral Coefficients (MFCCs) obtained by non-linear spectral contrast stretching.
  • Keywords
    noise; speech enhancement; speech recognition; Mel-frequency cepstral coefficients; automatic speech recognition systems; background noise; delta parameters; dynamic change enhancement; full-band spectrum; in-car CENSREC-2 database; in-car speech recognition; log-energy estimation; mean smoothing; nonlinear spectral contrast stretching; robust log-energy estimation; speech signal; sub-band spectrum; Dynamic change enhancement; in-car speech recognition; log-energy; mel-filterbank (MFB); mel-frequency cepstral coefficients (MFCCs);
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2260151
  • Filename
    6508817