Title :
Learning from multimodal observations
Author_Institution :
Media Lab., MIT, Cambridge, MA, USA
Abstract :
Human-computer interaction based on recognition of speech, gestures, and other natural modalities is on the rise. Recognition technologies are typically developed in a statistical framework and require large amounts of training data. The cost of collecting manually annotated data is usually the bottleneck in developing such systems. We explore the idea of learning from unannotated data by leveraging information across multiple modes of input. A working system inspired by infant language learning which learns from untranscribed speech and images is presented
Keywords :
gesture recognition; learning (artificial intelligence); speech recognition; speech-based user interfaces; gesture recognition; human-computer interaction; infant language learning; learning; multimodal observations; speech recognition; statistical framework; training data; unannotated data; untranscribed images; untranscribed speech; Appropriate technology; Cameras; Costs; Human computer interaction; Laboratories; Microphones; Natural languages; Pattern recognition; Speech recognition; Training data;
Conference_Titel :
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
Conference_Location :
New York, NY
Print_ISBN :
0-7803-6536-4
DOI :
10.1109/ICME.2000.869668