• DocumentCode
    438722
  • Title

    Pixels that sound

  • Author

    Kidron, Einat ; Schechner, Yoav Y. ; Elad, Michael

  • Author_Institution
    Dept. Electr. Eng., Israel Inst. Technol., Haifa, Israel
  • Volume
    1
  • fYear
    2005
  • fDate
    20-25 June 2005
  • Firstpage
    88
  • Abstract
    People and animals fuse auditory and visual information to obtain robust perception. A particular benefit of such cross-modal analysis is the ability to localize visual events associated with sound sources. We aim to achieve this using computer-vision aided by a single microphone. Past efforts encountered problems stemming from the huge gap between the dimensions involved and the available data. This has led to solutions suffering from low spatio-temporal resolutions. We present a rigorous analysis of the fundamental problems associated with this task. Then, we present a stable and robust algorithm which overcomes past deficiencies. It grasps dynamic audio-visual events with high spatial resolution, and derives a unique solution. The algorithm effectively detects pixels that are associated with the sound, while filtering out other dynamic pixels. It is based on canonical correlation analysis (CCA), where we remove inherent ill-posedness by exploiting the typical spatial sparsity of audio-visual events. The algorithm is simple and efficient thanks to its reliance on linear programming and is free of user-defined parameters. To quantitatively assess the performance, we devise a localization criterion. The algorithm capabilities were demonstrated in experiments, where it overcame substantial visual distractions and audio noise.
  • Keywords
    audio signal processing; audio-visual systems; computer vision; correlation theory; filtering theory; image resolution; video signal processing; audio noise; auditory-visual information fusion; canonical correlation analysis; computer vision aided; cross-modal analysis; dynamic audio-visual events; dynamic pixel filtering; linear programming; microphone; visual distractions; visual event localization; Animals; Auditory system; Computer science; Computer vision; Filtering; Fuses; Microphones; Motion analysis; Robustness; Spatial resolution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
  • ISSN
    1063-6919
  • Print_ISBN
    0-7695-2372-2
  • Type

    conf

  • DOI
    10.1109/CVPR.2005.274
  • Filename
    1467253