Title :
Scene Parsing by Integrating Function, Geometry and Appearance Models
Author :
Yibiao Zhao ; Song-Chun Zhu
Author_Institution :
Dept. of Stat., Univ. of California, Los Angeles, Los Angeles, CA, USA
Abstract :
Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essential property to define an indoor object, e.g. "a chair to sit on", ii) The geometry (3D shape) of an object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometry-appearance (FGA) hierarchy. The hierarchical structure includes a scene category, functional groups, functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate the proposed approach not only significantly widens the scope of indoor scene parsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
Keywords :
Markov processes; Monte Carlo methods; geometry; image classification; image segmentation; maximum likelihood estimation; object recognition; simulated annealing; trees (mathematics); 2D segmentation map synthesis; 3D geometric shapes; 3D primitive shapes; 3D recovery; FGA hierarchy; MAP solution; Metropolis-Hastings acceptance probability; appearance models; appearance-based classification paradigm; function model; function-geometry-appearance hierarchy; functional groups; functional object recognition; functional parts; geometry model; hierarchical structure; indoor datasets; indoor functional objects; indoor image parsing; indoor scene parsing algorithm; maximum a posteriori solution; parse tree; scene category; simulated annealing MCMC algorithm; stochastic grammar model; Cameras; Computational modeling; Geometry; Grammar; Shape; Solid modeling; Three-dimensional displays; affordance; function; functionality; image parsing; scene parsing; stochastic scene grammar;
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
Conference_Location :
Portland, OR
DOI :
10.1109/CVPR.2013.401