DocumentCode :
3579910
Title :
Towards semantic visual SLAM
Author :
Reid, Ian
Author_Institution :
Univ. of Adelaide, Adelaide, SA, Australia
fYear :
2014
Firstpage :
1
Lastpage :
1
Abstract :
Summary form only given. Visual Simultaneous Localisation and Mapping is the process whereby a camera builds a map of a previously unseen environment, and localises itself with respect to that environment, often in real-time. Although there has been remarkable progress, and it is now possible, for example, to build dense maps in real-time using high-end commodity hardware, most SLAM research has remained rooted in geometry. Geometric representations are limited though, in that they rarely encode higher-level information to describe the scene, are wasteful of storage, and brittle to changes. I am therefore interested extending SLAM beyond geometry to more semantically meaningful representations in which a scene can be segmented into components, and compactly described and represented. In work towards that end, in this talk I will describe work i my group from the last few years that progresses to this end:. First, I will describe a system that combines detection, segmentation and tracking of instances of a known 3D object class with a system for real-time dense visual mapping. We learn a low dimensional space of shapes to encode prior shape knowledge, and then perform simultaneous segmentation and tracking of an instance of a 3D shape class by optimising for shape and pose. This tracking methodology is then incorporated into a system for real-time dense visual mapping of a scene demonstrating that prior knowledge of objects can be incorporated into a SLAM map to improve map fidelity. Second I will present work that shows how structured learning methods can be used within SLAM scene understanding methods. I will discuss a method for reconstructing building interiors using a combination of point-based parallel tracking and mapping, with single-view reconstruction techniques. Here, by leveraging prior knowledge of shape (namely that the building interior conforms to a "Manhattan" model), we develop an efficient global optimisation method for inferring key semantic prop- rties of the scene, namely its boundaries: the floor, ceiling and walls. This method makes use of single-view pixel-level texture cues, as well as opportunistic use of 3D information such as photo-consistency, and sparse 3d map data. Finally I will discuss progress in using structured learning methods for interpreting RGB-D data using Decision Tree Fields.
Keywords :
SLAM (robots); decision trees; image sensors; learning (artificial intelligence); optimisation; 3D object class; Manhattan model; SLAM map; camera; decision tree fields; dense maps; geometric representations; global optimisation method; key semantic properties; map fidelity; real-time dense visual mapping; real-time systems; semantic visual SLAM; structured learning methods; tracking methodology; visual simultaneous localisation and mapping; Australia; Real-time systems; Semantics; Shape; Simultaneous localization and mapping; Three-dimensional displays; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control Automation Robotics & Vision (ICARCV), 2014 13th International Conference on
Type :
conf
DOI :
10.1109/ICARCV.2014.7064267
Filename :
7064267
Link To Document :
بازگشت