Robustness to speaker position in distant-talking automatic speech recognition

Author

Gomez, Raquel ; Nakamura, Kentaro ; Nakadai, Kazuhiro

Author_Institution

Honda Res. Inst. Japan Co. Ltd., Japan

fYear

2013

Firstpage

7034

Lastpage

7038

Abstract

In this paper, we show a method that significantly improved our previous work in single-channel dereverberation. The proposed method is more robust to changes in speaker position in distant talking ASR. First, we update the room transfer function (RTF) and weighting parameters for dereverberation to the target speaker position. This scheme corrects speech power variation as a function of position in the waveform level. Consequently, its impact to the acoustic model is verified. Then, we implement a fast acoustic model update reflective of the speech power level of the target speaker position. Furthermore, the scheme in updating the model is simple and precludes time-consuming model re-estimation. As a result, the proposed method can be executed online. The synergy of these corrective measures significantly minimizes the mismatch between training and testing conditions. We test our method using real reverberant data with different locations inside the room. Experimental results show that the proposed method outperforms the conventional methods in terms of ASR performance. Moreover, our fast acoustic model update scheme is at par in terms of recognition performance against time-consuming model re-estimation.

Keywords

reverberation; speaker recognition; speech enhancement; RTF; acoustic model; distant talking ASR; distant-talking automatic speech recognition; fast acoustic model update scheme; room transfer function; single-channel dereverberation; speaker position; speech enhancement; speech power level; speech power variation; testing conditions; time-consuming model reestimation; training conditions; waveform level; weighting parameters; Acoustics; Adaptation models; Data models; Hidden Markov models; Robustness; Speech; Speech enhancement; Automatic Speech Recognition; Dereverberation; Robustness; Speech Enhancement;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639026

Filename

6639026