• DocumentCode
    417295
  • Title

    Combining feature compensation and weighted Viterbi decoding for noise robust speech recognition with limited adaptation data

  • Author

    Cui, Xiaodong ; Alwan, Abeer

  • Author_Institution
    Dept. of Electr. Eng., Univ. of California, Los Angeles, CA, USA
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    Acoustic models trained with clean speech signals suffer in the presence of background noise. In some situations, only a limited amount of noisy data of the new environment is available based on which the clean models could be adapted. A feature compensation approach employing polynomial regression of the signal-to-noise ratio (SNR) is proposed in this paper. While clean acoustic models remain unchanged, a bias which is a polynomial function of utterance SNR is estimated and removed from the noisy feature. Depending on the amount of noisy data available, the algorithm could be flexibly carried out at different levels of granularity. Based on the Euclidean distance, the similarity between the residual distribution and the clean models are estimated and used as the confidence factor in a back-end weighted Viterbi decoding (WVD) algorithm. With limited amounts of noisy data, the feature compensation algorithm outperforms maximum likelihood linear regression (MLLR) for the Aurora2 database. Weighted Viterbi decoding further improves recognition accuracy.
  • Keywords
    Viterbi decoding; feature extraction; maximum likelihood estimation; polynomial approximation; regression analysis; speech recognition; Aurora2 database; Euclidean distance; WVD algorithm; back-end weighted Viterbi decoding; background noise; bias estimation; clean models; confidence factor; feature compensation; limited adaptation data; noise robust speech recognition; polynomial function; polynomial regression; recognition accuracy; residual distribution; signal-to-noise ratio; similarity estimation; utterance SNR; weighted Viterbi decoding; Acoustic noise; Maximum likelihood decoding; Maximum likelihood linear regression; Noise robustness; Polynomials; Signal to noise ratio; Speech enhancement; Speech recognition; Viterbi algorithm; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326149
  • Filename
    1326149