مرکز منطقه ای اطلاع رساني علوم و فناوري - MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

DocumentCode :

76952

Title :

MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

Author :

Gonzalez, Jose A. ; Peinado, Antonio M. ; Ning Ma ; Gomez, Angel M. ; Barker, J.

Author_Institution :

Dept. of Teor. de la Senal Telematica y Comun., Univ. de Granada, Granada, Spain

Volume :

Issue :

fYear :

2013

fDate :

Mar-13

Firstpage :

624

Lastpage :

635

Abstract :

This paper addresses the problem of feature compensation in the log-spectral domain by using the missing-data (MD) approach to noise robust speech recognition, that is, the log-spectral features can be either almost unaffected by noise or completely masked by it. First, a general MD framework based on minimum mean square error (MMSE) estimation is introduced which exploits the correlation across frequency bands to reconstruct the missing features. This framework allows the derivation of different MD imputation approaches and, in particular, a novel technique taking advantage of truncated Gaussian distributions is presented. While the proposed technique provides excellent results at high and medium signal-to-noise ratios (SNRs), its performance diminishes at low SNRs where very few reliable features are available. The reconstruction technique is therefore extended to exploit temporal constraints using two different approaches. In the first approach, time-frequency patches of speech containing a number of consecutive frames are modeled using a Gaussian mixture model (GMM). In the second one, the sequential structure of speech is alternatively modeled by a hidden Markov model (HMM). The proposed techniques are evaluated on Aurora-2 and Aurora-4 databases using both oracle and estimated masks. In both cases, the proposed techniques outperform the recognition performance obtained by the baseline system and other related techniques. Also, the introduction of a temporal modeling turns out to be very effective in reconstructing spectra at low SNRs. In particular, HMMs show the highest capability of accounting for time correlations and, therefore, achieve the best results.

Keywords :

Gaussian distribution; hidden Markov models; least mean squares methods; speech recognition; Aurora-2 databases; Aurora-4 databases; GMM; Gaussian mixture model; HMM; MD approach; MD imputation approaches; MMSE-based missing-feature reconstruction; feature compensation; hidden Markov model; log-spectral domain; log-spectral features; minimum mean square error estimation; missing-data approach; robust speech recognition; signal-to-noise ratios; speech containing; speech sequential structure; temporal constraints; temporal modeling; time-frequency patches; truncated Gaussian distributions; Correlation; Covariance matrix; Estimation; Hidden Markov models; Noise; Reliability; Speech; Minimum mean square error estimation; missing-feature; robust speech recognition; spectral reconstruction;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2012.2229982

Filename :

6362180

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=76952