Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech

Author

Erell, Adoram ; Weintraub, Mitchel

Author_Institution

SRI Inst., Menlo Park, CA, USA

Volume

1

Issue

1

fYear

1993

fDate

1/1/1993 12:00:00 AM

Firstpage

68

Lastpage

76

Abstract

An estimation algorithm for noise robust speech recognition, the minimum mean log spectral distance (MMLSD), is presented. The estimation is matched to the recognizer by seeking to minimize the average distortion as measured by a Euclidean distance between filterbank log-energy vectors, approximating the weighted-cepstral distance used by the recognizer. The estimation is computed using a clean speech spectral probability distribution, estimated from a database, and a stationary, ARMA model for the noise. When trained on clean speech and tested with additive white noise at 10-dB SNR, the recognition accuracy with the MMLSD algorithm is comparable to that achieved with training the recognizer at the same constant 10-dB SNR. The algorithm is also highly efficient with a quasi-stationary environmental noise, recorded with a desktop microphone, and requires almost no tuning to differences between this noise and the computer-generated white noise

Keywords

Markov processes; filtering and prediction theory; speech recognition; white noise; 10 dB; ARMA model; Euclidean distance; MMLSD algorithm; Markov models; SNR; additive white noise; average distortion; clean speech; computer-generated white noise; database; desktop microphone; estimation algorithm; filterbank energy estimation; filterbank log-energy vectors; minimum mean log spectral distance algorithm; noise robust speech recognition; noisy speech; quasistationary environmental noise; recognition accuracy; speech spectral probability distribution; weighted-cepstral distance; Distortion measurement; Distributed computing; Euclidean distance; Filter bank; Noise robustness; Probability distribution; Signal to noise ratio; Speech enhancement; Speech recognition; Working environment noise;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.221385

Filename

221385