Title :
Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition
Author :
Kim, D.K. ; Gales, M.J.F.
Author_Institution :
Dept. of Electron. & Comput. Eng., Chonnam Nat. Univ., Gwangju, South Korea
Abstract :
Adaptive training is a widely used technique for building speech recognition systems on nonhomogeneous training data. Recently, there has been interest in applying these approaches for situations where there is significant levels of background noise in the training data. Various schemes for adaptive training are based on noise-, or speaker-, specific transforms of features to yield estimates of the clean speech. However, when there are high levels of background noise, these clean speech estimates may be poor resulting in degradations in performance. In this paper, a new approach for adaptive training on noise-corrupted training data is presented. It extends a popular form of linear transform for model-based adaptation and adaptive training, constrained MLLR (CMLLR), to reflect additional uncertainty from noise-corrupted observations. This new form of adaptation transform is called noisy CMLLR (NCMLLR). NCMLLR uses a modified version of generative model between clean speech and noisy observation, similar to factor analysis (FA). However, in contrast to FA here the generative model describes an adaptation transform, rather than a covariance matrix structure. The use of NCMLLR for adaptive training using an expectation-maximization approach is described. Discriminative adaptive training with NCMLLR is also described based on the minimum phone error criterion. Experimental results comparing NCMLLR with standard adaptive training schemes are given on a noise-corrupted version of Resource Management, the ARPA 1994 CSRNAB Spoke 10 task, and in-car recorded data.
Keywords :
expectation-maximisation algorithm; regression analysis; speech recognition; transforms; ARPA 1994 CSRNAB Spoke 10 task; adaptation transform; constrained MLLR; discriminative adaptive training; expectation-maximization approach; factor analysis; minimum phone error criterion; model-based adaptation; noise-corrupted training data; noise-robust speech recognition; noisy constrained maximum-likelihood linear regression; resource management; Adaptation model; Background noise; Linear regression; Management training; Maximum likelihood estimation; Noise robustness; Speech analysis; Speech enhancement; Speech recognition; Training data; Adaptive training; noise robustness; speaker adaptation; speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2010.2047756