• DocumentCode
    1474845
  • Title

    Improved Speech Presence Probabilities Using HMM-Based Inference, With Applications to Speech Enhancement and ASR

  • Author

    Borgström, Bengt J. ; Alwan, Abeer

  • Author_Institution
    Electr. Eng. Dept., Univ. of California Los Angeles, Los Angeles, CA, USA
  • Volume
    4
  • Issue
    5
  • fYear
    2010
  • Firstpage
    808
  • Lastpage
    815
  • Abstract
    This paper presents a technique for determining improved speech presence probabilities (SPPs), by exploiting the temporal correlation present in spectral speech data. Based on a set of traditional SPPs, we estimate the underlying speech presence probability via statistical inference. Traditional SPPs are assumed to be observations of channel-specific two-state Markov models. Corresponding steady-state and transitional statistics are set to capture the well-known temporal correlation of spectral speech data, and observation statistics are modeled based on the effect of additive acoustic noise on resulting SPPs. Once underlying models have been parameterized, improved speech presence probabilities can be estimated via traditional inference techniques, such as the forward or forward-backward algorithms. The two-state configuration of underlying signal models enables low complexity HMM-based processing, only slightly increasing complexity relative to standard SPPs, and thereby making the proposed framework attractive for resource-constrained scenarios. Proposed SPP masks are shown to provide a significant increase in accuracy relative to the state-of-the-art method of the paper by Cohen and Berdugo (“Speech enhancement for non-stationary noise environments,” Signal Processing, vol. 81, no. 11, pp. 2403-2418, 2001), in terms of the mean pointwise Kullback-Leibler (KL) distance. When applied to soft-decision speech enhancement, proposed SPPs show improved results in terms of segmental SNRs. Closer analysis reveals significantly decreased noise leakage, whereas speech distortion is increased. When applied to automatic speech recognition (ASR), the use of soft-decision enhancement with proposed SPPs provides increased recognition performance, relative to the paper by Cohen and Berdugo.
  • Keywords
    Markov processes; acoustic noise; speech enhancement; speech recognition; ASR; HMM-based inference; additive acoustic noise; automatic speech recognition; channel-specific two-state Markov models; forward-backward algorithms; soft-decision enhancement; spectral speech data; speech enhancement; speech presence probabilities; temporal correlation; Additive noise; Automatic speech recognition; Hidden Markov models; Probability; Signal processing; Speech analysis; Speech enhancement; Statistics; Steady-state; Working environment noise; Automatic speech recognition (ASR); hidden Markov models (HMMs); noise suppression; soft-decision speech enhancement; speech presence probability (SPPs);
  • fLanguage
    English
  • Journal_Title
    Selected Topics in Signal Processing, IEEE Journal of
  • Publisher
    ieee
  • ISSN
    1932-4553
  • Type

    jour

  • DOI
    10.1109/JSTSP.2010.2048605
  • Filename
    5451104