Title :
Audio-visual scene understanding utilizing text information for a cooking support robot
Author :
Ryosuke Kojima;Osamu Sugiyama;Kazuhiro Nakadai
Author_Institution :
Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1, O-okayama, Meguro-ku, 152-8552, JAPAN
Abstract :
This paper addresses multimodal “scene understanding” for a robot using audio-visual and text information. Scene understanding is defined by extracting six-W information such as What, When, Where, Who, Why, and hoW on the surrounding environment. Although scene understanding for a robot has been studied in the fields of robot vision and audition, only the first four Ws except for why and how information were considered. We, thus, focus on extracting how information, in particular, on cooking scenes. In cooking scenes, we define how information as a cooking procedure, and it is useful that a robot gives appropriate advice for cooking. To realize such cooking support, we propose a multi-modal cooking procedure recognition framework consisting of Convolutional Neural Network (CNN), and Hierarchical Hidden Markov Model (HHMM). CNN is knows as one of the most advanced classifiers, and it is applied to recognize a cooking events from audio and visual information. HHMM models a cooking procedure represented by a sequence of cooking events, which is defined as a relationship between cooking events using text data obtained from web, and the cooking events classified with CNN. Therefore, our proposed framework integrates these three types of modalities. We constructed an interactive cooking support system based on the proposed framework, which advice a next step in the current cooking procedure through human-robot communication. Preliminary results with simulated and real recorded multi-modal scenes showed the robustness of the proposed framework in a noisy and/or occluded situation.
Keywords :
"Hidden Markov models","Data mining","Robot sensing systems","Cameras","Data models","Microphones"
Conference_Titel :
Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on
DOI :
10.1109/IROS.2015.7353973