• DocumentCode
    1188323
  • Title

    Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR

  • Author

    Cui, Xiaodong ; Alwan, Abeer

  • Author_Institution
    Dept. of Electr. Eng., Univ. of California, Los Angeles, CA, USA
  • Volume
    13
  • Issue
    6
  • fYear
    2005
  • Firstpage
    1161
  • Lastpage
    1172
  • Abstract
    A feature compensation (FC) algorithm based on polynomial regression of utterance signal-to-noise ratio (SNR) for noise robust automatic speech recognition (ASR) is proposed. In this algorithm, the bias between clean and noisy speech features is approximated by a set of polynomials which are estimated from adaptation data from the new environment by the expectation-maximization (EM) algorithm under the maximum likelihood (ML) criterion. In ASR, the utterance SNR for the speech signal is first estimated and noisy speech features are then compensated for by regression polynomials. The compensated speech features are decoded via acoustic HMMs trained with clean data. Comparative experiments on the Aurora 2 (English) and the German part of the Aurora 3 databases are performed between FC and maximum likelihood linear regression (MLLR). With the Aurora2 experiments, there are two MLLR implementations: pooling adaptation data across all SNRs, and using three distinct SNR clusters. For each type of noise, FC achieves, on average, a word error rate reduction of 16.7% and 16.5% for Set A, and 20.5% and 14.6% for Set B compared to the first and second MLLR implementations, respectively. For each SNR condition, FC achieves, on average, a word error rate reduction of 33.1% and 34.5% for Set A, and 23.6% and 21.4% for Set B. Results using the Aurora3 database show that, the best FC performance outperforms MLLR by 15.9%, 3.0% and 14.6% for well-matched, medium-mismatched and high-mismatched conditions, respectively.
  • Keywords
    acoustic signal processing; decoding; error statistics; hidden Markov models; optimisation; polynomial approximation; regression analysis; speech recognition; ASR; acoustic HMM; approximation; automatic speech recognition; decoding; expectation-maximization algorithm; feature compensation algorithm; hidden Markov model; maximum likelihood criterion; polynomial regression; speech signal; word error rate; Acoustic noise; Automatic speech recognition; Error analysis; Maximum likelihood estimation; Maximum likelihood linear regression; Noise robustness; Polynomials; Signal to noise ratio; Speech recognition; Working environment noise; Feature compensation; noise robust speech recognition; polynomial regression; signal-to-noise ratio (SNR) estimation;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/TSA.2005.853002
  • Filename
    1518916