DocumentCode :
3424631
Title :
Learning the Visual Interpretation of Sentences
Author :
Zitnick, C. Lawrence ; Parikh, D. ; Vanderwende, Lucy
fYear :
2013
fDate :
1-8 Dec. 2013
Firstpage :
1681
Lastpage :
1688
Abstract :
Sentences that describe visual scenes contain a wide variety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate tuples that contain two nouns and a relation. The relation may take several forms, such as a verb, preposition, adjective or their combination. We model a scene using a Conditional Random Field (CRF) formulation where each node corresponds to an object, and the edges to their relations. We determine the potentials of the CRF using the tuples extracted from the sentences. We generate novel scenes depicting the sentences´ visual meaning by sampling from the CRF. The CRF is also used to score a set of scenes for a text-based image retrieval task. Our results show we can generate (retrieve) scenes that convey the desired semantic meaning, even when scenes (queries) are described by multiple sentences. Significant improvement is found over several baseline approaches.
Keywords :
image processing; image retrieval; text analysis; CRF; baseline approaches; conditional random field; queries; semantic phrases; sentence visual interpretation; spatial relations; text-based image retrieval task; visual features; visual scenes; Abstracts; Art; Computational modeling; Feature extraction; Radio access networks; Semantics; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision (ICCV), 2013 IEEE International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-5499
Type :
conf
DOI :
10.1109/ICCV.2013.211
Filename :
6751319
Link To Document :
بازگشت