• DocumentCode
    352466
  • Title

    Integration of speech and vision using mutual information

  • Author

    Roy, Deb

  • Author_Institution
    Media Lab., MIT, Cambridge, MA, USA
  • Volume
    6
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    2369
  • Abstract
    We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. We report results of experiments with a corpus of infant-directed speech and images
  • Keywords
    audio-visual systems; image processing; learning systems; natural languages; speech processing; acoustic distance metric; audio-visual lexicon extraction; automatic continuous speech segmentation; co-occurring input; infant-directed image corpus; infant-directed speech corpus; mutual information; speech-vision integration; spoken input; spoken words; visual categories; visual distance metric; visual input; word boundaries; word learning; Acoustic signal detection; Computational modeling; Image segmentation; Laboratories; Learning systems; Mutual information; Natural languages; Shape; Speech; Streaming media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
  • Conference_Location
    Istanbul
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-6293-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2000.859317
  • Filename
    859317