Temporal smearing compensation in reverberant environment for speech-based human-robot interaction

Author

Gomez, Randy ; Nakamura, Keisuke ; Mizumoto, Takeshi ; Nakadai, Kazuhiro

Author_Institution

Honda Res. Inst. Japan Ltd. Co., Wako, Japan

fYear

2015

fDate

26-30 May 2015

Firstpage

3347

Lastpage

3353

Abstract

Speech-based human-robot interaction is often plagued with issues such as reverberation and changes in speaker position that impacts overall performance. In this paper, we show a method in compensating the joint effects of reverberation and the change in speaker position. The acoustic perturbation caused by these two takes its toll on the Automatic Speech Recognition (ASR) and then the Spoken Language Understanding (SLU). Consequently, these will lead to a failure in the human-robot interaction experience. The proposed method is specifically designed to address the challenging environment condition in which robots are deployed. First, we analyze the impact of reverberation in the form of temporal smearing per change in speaker position. Then, we extract the smearing coefficients that capture the joint dynamics between the speech signal at current position and the room acoustics as observed by the robot. These coefficients are utilized to update the room transfer function (RTF) and the suppression parameters are stored offline. Moreover, all of these processes are optimized in the context of the ASR system for robot application. In the online mode, the reverberant data at an arbitrary position is processed using the parameters pre-computed offline. This effectively compensates the joint effects of reverberation at the arbitrary speaker position. Experimental results using real data gathered in a human-robot communication setting show that the proposed method outperforms existing methods.

Keywords

acoustic signal processing; human-robot interaction; perturbation techniques; reverberation; speech recognition; speech-based user interfaces; ASR system; RTF; SLU; acoustic perturbation; automatic speech recognition; joint dynamics; online mode; reverberant environment; room transfer function; smearing coefficients; speaker position; speech-based human-robot interaction; spoken language understanding; suppression parameters; temporal smearing compensation; Databases; Hidden Markov models; Joints; Reverberation; Robots; Speech; Automatic Speech Recognition; Dereverberation; Robustness; Speech Enhancement;

fLanguage

English

Publisher

ieee

Conference_Titel

Robotics and Automation (ICRA), 2015 IEEE International Conference on

Conference_Location

Seattle, WA

Type

conf

DOI

10.1109/ICRA.2015.7139661

Filename

7139661