DocumentCode :
352466
Title :
Integration of speech and vision using mutual information
Author :
Roy, Deb
Author_Institution :
Media Lab., MIT, Cambridge, MA, USA
Volume :
6
fYear :
2000
fDate :
2000
Firstpage :
2369
Abstract :
We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. We report results of experiments with a corpus of infant-directed speech and images
Keywords :
audio-visual systems; image processing; learning systems; natural languages; speech processing; acoustic distance metric; audio-visual lexicon extraction; automatic continuous speech segmentation; co-occurring input; infant-directed image corpus; infant-directed speech corpus; mutual information; speech-vision integration; spoken input; spoken words; visual categories; visual distance metric; visual input; word boundaries; word learning; Acoustic signal detection; Computational modeling; Image segmentation; Laboratories; Learning systems; Mutual information; Natural languages; Shape; Speech; Streaming media;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
ISSN :
1520-6149
Print_ISBN :
0-7803-6293-4
Type :
conf
DOI :
10.1109/ICASSP.2000.859317
Filename :
859317
Link To Document :
بازگشت