Title :
Multi-talker speech recognition under ego-motion noise using Missing Feature Theory
Author :
Ince, Gokhan ; Nakadai, Kazuhiro ; Rodemann, Tobias ; Tsujino, Hiroshi ; Imura, Jun-ichi
Author_Institution :
Honda Res. Inst. Japan Co., Ltd., Wako, Japan
Abstract :
This paper presents a system that gives a mobile robot the ability to recognize target speaker´s speech, even if the robot performs an action and there are multiple speakers talking in the room. Associated problems to this system are twofold: (1) While the robot is moving, the joints inevitably generate ego-motion noise due to its motors. (2) Recognizing target speech against other interfering speech signals is a difficult task. Since typical solutions to (1) and (2), motor noise suppression and sound source separation, both introduce distortion to the processed signals, the performance of automatic speech recognition (ASR) deteriorates. Instead of removing the ego-motion noise with conventional noise suppression methods, in this work, we investigate methods to eliminate the unreliable parts of the audio features that are contaminated by the ego-motion noise. For this purpose, we model masks that filter unreliable speech features based on the ratio of speech and motor noise energies. We analyze the performance of the proposed technique under various test conditions by comparing it to the performance of existing Missing Feature Theory-based ASR implementations. Finally, we propose an integration framework for two different masks that are designed to eliminate ego noise and to filter the leakage energy of interfering sound sources. We demonstrate that the proposed methods achieve a high ASR accuracy.
Keywords :
audio signal processing; feature extraction; filtering theory; interference (signal); interference suppression; mobile robots; source separation; speaker recognition; ASR; automatic speech recognition; ego-motion noise; missing feature theory; mobile robot; motor noise suppression; multitalker speech recognition; sound source separation; speech signal interference; target speaker recognition; unreliable speech feature filtering;
Conference_Titel :
Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-6674-0
DOI :
10.1109/IROS.2010.5650112