DocumentCode
352466
Title
Integration of speech and vision using mutual information
Author
Roy, Deb
Author_Institution
Media Lab., MIT, Cambridge, MA, USA
Volume
6
fYear
2000
fDate
2000
Firstpage
2369
Abstract
We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. We report results of experiments with a corpus of infant-directed speech and images
Keywords
audio-visual systems; image processing; learning systems; natural languages; speech processing; acoustic distance metric; audio-visual lexicon extraction; automatic continuous speech segmentation; co-occurring input; infant-directed image corpus; infant-directed speech corpus; mutual information; speech-vision integration; spoken input; spoken words; visual categories; visual distance metric; visual input; word boundaries; word learning; Acoustic signal detection; Computational modeling; Image segmentation; Laboratories; Learning systems; Mutual information; Natural languages; Shape; Speech; Streaming media;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location
Istanbul
ISSN
1520-6149
Print_ISBN
0-7803-6293-4
Type
conf
DOI
10.1109/ICASSP.2000.859317
Filename
859317
Link To Document