• DocumentCode
    3332465
  • Title

    A Sentence Is Worth a Thousand Pixels

  • Author

    Fidler, Sanja ; Sharma, Ashok ; Urtasun, Raquel

  • Author_Institution
    TTI, Chicago, IL, USA
  • fYear
    2013
  • fDate
    23-28 June 2013
  • Firstpage
    1995
  • Lastpage
    2002
  • Abstract
    We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models.
  • Keywords
    image segmentation; object detection; text analysis; UIUC sentences dataset; complex sentential descriptions; holistic scene; image information; object extraction; semantic parsing; semantic segmentation; spatial extent; thousand pixels; Boats; Deformable models; Image recognition; Image segmentation; Object detection; Semantics; Visualization; Holistic scene models; Images and text; Scene understanding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    1063-6919
  • Type

    conf

  • DOI
    10.1109/CVPR.2013.260
  • Filename
    6619104