Author_Institution :
Univ. of Adelaide, Adelaide, SA, Australia
Abstract :
Summary form only given. Visual Simultaneous Localisation and Mapping is the process whereby a camera builds a map of a previously unseen environment, and localises itself with respect to that environment, often in real-time. Although there has been remarkable progress, and it is now possible, for example, to build dense maps in real-time using high-end commodity hardware, most SLAM research has remained rooted in geometry. Geometric representations are limited though, in that they rarely encode higher-level information to describe the scene, are wasteful of storage, and brittle to changes. I am therefore interested extending SLAM beyond geometry to more semantically meaningful representations in which a scene can be segmented into components, and compactly described and represented. In work towards that end, in this talk I will describe work i my group from the last few years that progresses to this end:. First, I will describe a system that combines detection, segmentation and tracking of instances of a known 3D object class with a system for real-time dense visual mapping. We learn a low dimensional space of shapes to encode prior shape knowledge, and then perform simultaneous segmentation and tracking of an instance of a 3D shape class by optimising for shape and pose. This tracking methodology is then incorporated into a system for real-time dense visual mapping of a scene demonstrating that prior knowledge of objects can be incorporated into a SLAM map to improve map fidelity. Second I will present work that shows how structured learning methods can be used within SLAM scene understanding methods. I will discuss a method for reconstructing building interiors using a combination of point-based parallel tracking and mapping, with single-view reconstruction techniques. Here, by leveraging prior knowledge of shape (namely that the building interior conforms to a "Manhattan" model), we develop an efficient global optimisation method for inferring key semantic prop- rties of the scene, namely its boundaries: the floor, ceiling and walls. This method makes use of single-view pixel-level texture cues, as well as opportunistic use of 3D information such as photo-consistency, and sparse 3d map data. Finally I will discuss progress in using structured learning methods for interpreting RGB-D data using Decision Tree Fields.
Keywords :
SLAM (robots); decision trees; image sensors; learning (artificial intelligence); optimisation; 3D object class; Manhattan model; SLAM map; camera; decision tree fields; dense maps; geometric representations; global optimisation method; key semantic properties; map fidelity; real-time dense visual mapping; real-time systems; semantic visual SLAM; structured learning methods; tracking methodology; visual simultaneous localisation and mapping; Australia; Real-time systems; Semantics; Shape; Simultaneous localization and mapping; Three-dimensional displays; Visualization;