• DocumentCode
    2915504
  • Title

    Semantic structure from motion

  • Author

    Bao, Sid Yingze ; Savarese, Silvio

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Michigan at Ann Arbor, Ann Arbor, MI, USA
  • fYear
    2011
  • fDate
    20-25 June 2011
  • Firstpage
    2025
  • Lastpage
    2032
  • Abstract
    Conventional rigid structure from motion (SFM) addresses the problem of recovering the camera parameters (motion) and the 3D locations (structure) of scene points, given observed 2D image feature points. In this paper, we propose a new formulation called Semantic Structure From Motion (SSFM). In addition to the geometrical constraints provided by SFM, SSFM takes advantage of both semantic and geometrical properties associated with objects in the scene (Fig. 1). These properties allow us to recover not only the structure and motion but also the 3D locations, poses, and categories of objects in the scene. We cast this problem as a max-likelihood problem where geometry (cameras, points, objects) and semantic information (object classes) are simultaneously estimated. The key intuition is that, in addition to image features, the measurements of objects across views provide additional geometrical constraints that relate cameras and scene parameters. These constraints make the geometry estimation process more robust and, in turn, make object detection more accurate. Our framework has the unique ability to: i) estimate camera poses only from object detections, ii) enhance camera pose estimation, compared to feature-point-based SFM algorithms, iii) improve object detections given multiple un-calibrated images, compared to independently detecting objects in single images. Extensive quantitative results on three datasets - LiDAR cars, street-view pedestrians, and Kinect office desktop - verify our theoretical claims.
  • Keywords
    cameras; feature extraction; image motion analysis; maximum likelihood estimation; natural scenes; object detection; optical radar; 2D image feature point; 3D location; Kinect office desktop; LiDAR car; SSFM; camera image parameter; camera parameter; camera pose estimation; feature-point-based SFM algorithm; geometrical constraint; geometry estimation process; image feature; max-likelihood problem; multiple image uncalibrated image; object detection; scene point; semantic information; semantic structure from motion; street-view pedestrians; Cameras; Detectors; Maximum likelihood estimation; Object detection; Semantics; Three dimensional displays;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
  • Conference_Location
    Providence, RI
  • ISSN
    1063-6919
  • Print_ISBN
    978-1-4577-0394-2
  • Type

    conf

  • DOI
    10.1109/CVPR.2011.5995462
  • Filename
    5995462