DocumentCode
3332465
Title
A Sentence Is Worth a Thousand Pixels
Author
Fidler, Sanja ; Sharma, Ashok ; Urtasun, Raquel
Author_Institution
TTI, Chicago, IL, USA
fYear
2013
fDate
23-28 June 2013
Firstpage
1995
Lastpage
2002
Abstract
We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models.
Keywords
image segmentation; object detection; text analysis; UIUC sentences dataset; complex sentential descriptions; holistic scene; image information; object extraction; semantic parsing; semantic segmentation; spatial extent; thousand pixels; Boats; Deformable models; Image recognition; Image segmentation; Object detection; Semantics; Visualization; Holistic scene models; Images and text; Scene understanding;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
Conference_Location
Portland, OR
ISSN
1063-6919
Type
conf
DOI
10.1109/CVPR.2013.260
Filename
6619104
Link To Document