مرکز منطقه ای اطلاع رساني علوم و فناوري - Stereo-Based Stochastic Mapping for Robust Speech Recognition

DocumentCode :

1135694

Title :

Stereo-Based Stochastic Mapping for Robust Speech Recognition

Author :

Afify, Mohamed ; Cui, Xiaodong ; Gao, Yuqing

Author_Institution :

Orange Lab., Smart Village, Cairo, Egypt

Volume :

Issue :

fYear :

2009

Firstpage :

1325

Lastpage :

1334

Abstract :

We present a stochastic mapping technique for robust speech recognition that uses stereo data. The idea is based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing. The proposed mapping is called stereo-based stochastic mapping (SSM). Two different estimators are considered. One is iterative and is based on the maximum a posteriori (MAP) criterion while the other uses the minimum mean square error (MMSE) criterion. The resulting estimators are effectively a mixture of linear transforms weighted by component posteriors, and the parameters of the linear transformations are derived from the joint distribution. Compared to the uncompensated result, the proposed method results in 45% relative improvement in word error rate (WER) for digit recognition in the car. In the same setting, SSM outperforms SPLICE and gives similar results to MMSE compensation of Huang A 66% relative improvement in word error rate (WER) is observed when applied in conjunction with multistyle training (MST) for large vocabulary English speech recognition in a real environment. Also, the combination of the proposed mapping with CMLLR leads to about 38% relative improvement in performance compared to CMLLR alone for real field data.

Keywords :

Gaussian processes; least mean squares methods; maximum likelihood estimation; speech recognition; stochastic systems; Gaussian mixture; MAP criterion; linear transforms; maximum a posteriori criterion; minimum mean square error; multistyle training; robust speech recognition; stereo based stochastic mapping; word error rate; Automatic speech recognition; Decoding; Error analysis; Noise robustness; Predictive models; Prototypes; Speech recognition; Stochastic processes; Taylor series; Working environment noise; Noise robustness; nonlinear mapping; speech recognition; stereo-data;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2018017

Filename :

5165116

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1135694