Author_Institution :
Dept. of Electr. Eng., Univ. of California, Los Angeles, CA, USA
Abstract :
A feature compensation (FC) algorithm based on polynomial regression of utterance signal-to-noise ratio (SNR) for noise robust automatic speech recognition (ASR) is proposed. In this algorithm, the bias between clean and noisy speech features is approximated by a set of polynomials which are estimated from adaptation data from the new environment by the expectation-maximization (EM) algorithm under the maximum likelihood (ML) criterion. In ASR, the utterance SNR for the speech signal is first estimated and noisy speech features are then compensated for by regression polynomials. The compensated speech features are decoded via acoustic HMMs trained with clean data. Comparative experiments on the Aurora 2 (English) and the German part of the Aurora 3 databases are performed between FC and maximum likelihood linear regression (MLLR). With the Aurora2 experiments, there are two MLLR implementations: pooling adaptation data across all SNRs, and using three distinct SNR clusters. For each type of noise, FC achieves, on average, a word error rate reduction of 16.7% and 16.5% for Set A, and 20.5% and 14.6% for Set B compared to the first and second MLLR implementations, respectively. For each SNR condition, FC achieves, on average, a word error rate reduction of 33.1% and 34.5% for Set A, and 23.6% and 21.4% for Set B. Results using the Aurora3 database show that, the best FC performance outperforms MLLR by 15.9%, 3.0% and 14.6% for well-matched, medium-mismatched and high-mismatched conditions, respectively.
Keywords :
acoustic signal processing; decoding; error statistics; hidden Markov models; optimisation; polynomial approximation; regression analysis; speech recognition; ASR; acoustic HMM; approximation; automatic speech recognition; decoding; expectation-maximization algorithm; feature compensation algorithm; hidden Markov model; maximum likelihood criterion; polynomial regression; speech signal; word error rate; Acoustic noise; Automatic speech recognition; Error analysis; Maximum likelihood estimation; Maximum likelihood linear regression; Noise robustness; Polynomials; Signal to noise ratio; Speech recognition; Working environment noise; Feature compensation; noise robust speech recognition; polynomial regression; signal-to-noise ratio (SNR) estimation;