DocumentCode
1652174
Title
Connecting concepts from vision and speech processing
Author
Wachsmuth, Sven ; Sagerer, Gerhard
Author_Institution
Fac. of Technol., Bielefeld Univ., Germany
fYear
1999
fDate
6/21/1905 12:00:00 AM
Firstpage
1
Lastpage
19
Abstract
This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach
Keywords
belief networks; speech processing; Bayesian networks; assembly task environment; disambiguation; error recovery; inferences; pointing gestures; probabilistic interaction scheme; referential links; speech data; speech processing; test data set; visual data; Artificial intelligence; Bayesian methods; Information resources; Joining processes; Layout; Robotic assembly; Robots; Robustness; Speech processing; System testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Integration of Speech and Image Understanding, 1999. Proceedings
Conference_Location
Corfu
Print_ISBN
0-7695-0471-X
Type
conf
DOI
10.1109/ISIU.1999.824829
Filename
824829
Link To Document