• DocumentCode
    2449725
  • Title

    Look who´s talking: speaker detection using video and audio correlation

  • Author

    Cutler, Ross ; Davis, Larry

  • Author_Institution
    Inst. for Adv. Comput. Studies, Maryland Univ., College Park, MD, USA
  • Volume
    3
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    1589
  • Abstract
    The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speech-reading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a time-delayed neural network, which is then used to perform a spatio-temporal search for a speaking person. Applications include videoconferencing, video indexing and improving human-computer interaction (HCI). An example HCI application is provided
  • Keywords
    audio-visual systems; computer vision; correlation methods; delays; gesture recognition; learning (artificial intelligence); neural nets; speaker recognition; speech-based user interfaces; audio correlation; audio-visual correlation learning; human-computer interaction; lip-reading; microphone; mouth visual motion; spatio-temporal search; speaker detection; speech recognition; speech-reading; talking person detection; time delayed neural network; video correlation; video indexing; videoconferencing; Animation; Application software; Face detection; Human computer interaction; Indexing; Microphones; Mouth; Neural networks; Speech recognition; Videoconference;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on
  • Conference_Location
    New York, NY
  • Print_ISBN
    0-7803-6536-4
  • Type

    conf

  • DOI
    10.1109/ICME.2000.871073
  • Filename
    871073