• DocumentCode
    76952
  • Title

    MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

  • Author

    Gonzalez, Jose A. ; Peinado, Antonio M. ; Ning Ma ; Gomez, Angel M. ; Barker, J.

  • Author_Institution
    Dept. of Teor. de la Senal Telematica y Comun., Univ. de Granada, Granada, Spain
  • Volume
    21
  • Issue
    3
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    624
  • Lastpage
    635
  • Abstract
    This paper addresses the problem of feature compensation in the log-spectral domain by using the missing-data (MD) approach to noise robust speech recognition, that is, the log-spectral features can be either almost unaffected by noise or completely masked by it. First, a general MD framework based on minimum mean square error (MMSE) estimation is introduced which exploits the correlation across frequency bands to reconstruct the missing features. This framework allows the derivation of different MD imputation approaches and, in particular, a novel technique taking advantage of truncated Gaussian distributions is presented. While the proposed technique provides excellent results at high and medium signal-to-noise ratios (SNRs), its performance diminishes at low SNRs where very few reliable features are available. The reconstruction technique is therefore extended to exploit temporal constraints using two different approaches. In the first approach, time-frequency patches of speech containing a number of consecutive frames are modeled using a Gaussian mixture model (GMM). In the second one, the sequential structure of speech is alternatively modeled by a hidden Markov model (HMM). The proposed techniques are evaluated on Aurora-2 and Aurora-4 databases using both oracle and estimated masks. In both cases, the proposed techniques outperform the recognition performance obtained by the baseline system and other related techniques. Also, the introduction of a temporal modeling turns out to be very effective in reconstructing spectra at low SNRs. In particular, HMMs show the highest capability of accounting for time correlations and, therefore, achieve the best results.
  • Keywords
    Gaussian distribution; hidden Markov models; least mean squares methods; speech recognition; Aurora-2 databases; Aurora-4 databases; GMM; Gaussian mixture model; HMM; MD approach; MD imputation approaches; MMSE-based missing-feature reconstruction; feature compensation; hidden Markov model; log-spectral domain; log-spectral features; minimum mean square error estimation; missing-data approach; robust speech recognition; signal-to-noise ratios; speech containing; speech sequential structure; temporal constraints; temporal modeling; time-frequency patches; truncated Gaussian distributions; Correlation; Covariance matrix; Estimation; Hidden Markov models; Noise; Reliability; Speech; Minimum mean square error estimation; missing-feature; robust speech recognition; spectral reconstruction;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2229982
  • Filename
    6362180