3D Scene Understanding by Voxel-CRF

Author

Byung-soo Kim ; Kohli, Pushmeet ; Savarese, Silvio

Author_Institution

Univ. of Michigan, Ann Arbor, MI, USA

fYear

2013

fDate

1-8 Dec. 2013

Firstpage

1425

Lastpage

1432

Abstract

Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with different depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth measurements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (voxels) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of partial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1 and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appealing results in improving the quality of the 3D reconstruction. We also demonstrate an interesting application of object removal and scene completion from RGB-D images.

Keywords

cameras; computer vision; geometry; image colour analysis; image reconstruction; inference mechanisms; random processes; solid modelling; 3D geometric relationships; 3D scene reconstruction; 3D scene understanding; NYU depth dataset; RGB-D images; Voxel-CRF model; computer vision; conditional random field; depth-RGB cameras; inference strategy; noisy depth measurements; object removal; object segmentation; occlusions; scene semantics; semantic labels; voxel-based 3D reconstruction; Cameras; Image reconstruction; Labeling; Noise measurement; Semantics; Solid modeling; Three-dimensional displays; 3D reconstruction; RGB-D; Scene understanding;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision (ICCV), 2013 IEEE International Conference on

Conference_Location

Sydney, NSW

ISSN

1550-5499

Type

conf

DOI

10.1109/ICCV.2013.180

Filename

6751287