Robot-directed speech detection using Multimodal Semantic Confidence based on speech, image, and motion

Author

Zuo, Xiang ; Iwahashi, Naoto ; Taguchi, Ryo ; Matsuda, Shigeki ; Sugiura, Komei ; Funakoshi, Kotaro ; Nakano, Mikio ; Oka, Natsuki

Author_Institution

Adv. Telecommun. Res. Labs., Kyoto, Japan

fYear

2010

fDate

14-19 March 2010

Firstpage

2458

Lastpage

2461

Abstract

In this paper, we propose a novel method to detect robot-directed (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure. The MSC measure is used to decide whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression. Experimental results show that, compared with a baseline method that uses speech confidence only, MSC achieved an absolute increase of 5% for clean speech and 12% for noisy speech in terms of average maximum F-measure.

Keywords

human-robot interaction; motion estimation; object recognition; regression analysis; robots; speech processing; human-robot interaction; image measure; logistic regression; motion measure; multimodal semantic confidence; object manipulation; robot directed speech detection; Acoustic signal detection; Current measurement; Face detection; Gas detectors; Humans; Motion detection; Motion measurement; Robots; Speech analysis; Speech recognition; human-robot interaction; multimodal semantic confidence; robot-directed speech detection;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location

Dallas, TX

ISSN

1520-6149

Print_ISBN

978-1-4244-4295-9

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2010.5494889

Filename

5494889