• DocumentCode
    3407915
  • Title

    Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora

  • Author

    Socher, Richard ; Fei-Fei, Li

  • Author_Institution
    Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA
  • fYear
    2010
  • fDate
    13-18 June 2010
  • Firstpage
    966
  • Lastpage
    973
  • Abstract
    We propose a semi-supervised model which segments and annotates images using very few labeled images and a large unaligned text corpus to relate image regions to text labels. Given photos of a sports event, all that is necessary to provide a pixel-level labeling of objects and background is a set of newspaper articles about this sport and one to five labeled images. Our model is motivated by the observation that words in text corpora share certain context and feature similarities with visual objects. We describe images using visual words, a new region-based representation. The proposed model is based on kernelized canonical correlation analysis which finds a mapping between visual and textual words by projecting them into a latent meaning space. Kernels are derived from context and adjective features inside the respective visual and textual domains. We apply our method to a challenging dataset and rely on articles of the New York Times for textual features. Our model outperforms the state-of-the-art in annotation. In segmentation it compares favorably with other methods that use significantly more labeled training data.
  • Keywords
    image representation; image segmentation; learning (artificial intelligence); text analysis; connecting modality; image annotation; image regions; kernelized canonical correlation analysis; labeled images; newspaper articles; pixel-level labeling; region-based representation; semisupervised model; semisupervised segmentation; sports event; textual features; textual words; unaligned text corpora; visual words; Computer science; Context modeling; Humans; Image retrieval; Image segmentation; Joining processes; Kernel; Labeling; Pixel; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1063-6919
  • Print_ISBN
    978-1-4244-6984-0
  • Type

    conf

  • DOI
    10.1109/CVPR.2010.5540112
  • Filename
    5540112