• DocumentCode
    3607174
  • Title

    Deep Head Pose: Gaze-Direction Estimation in Multimodal Video

  • Author

    Mukherjee, Sankha S. ; Robertson, Neil Martin

  • Author_Institution
    Visionlab, Heriot-Watt Univ., Edinburgh, UK
  • Volume
    17
  • Issue
    11
  • fYear
    2015
  • Firstpage
    2094
  • Lastpage
    2107
  • Abstract
    In this paper we present a convolutional neural network (CNN)-based model for human head pose estimation in low-resolution multi-modal RGB-D data. We pose the problem as one of classification of human gazing direction. We further fine-tune a regressor based on the learned deep classifier. Next we combine the two models (classification and regression) to estimate approximate regression confidence. We present state-of-the-art results in datasets that span the range of high-resolution human robot interaction (close up faces plus depth information) data to challenging low resolution outdoor surveillance data. We build upon our robust head-pose estimation and further introduce a new visual attention model to recover interaction with the environment . Using this probabilistic model, we show that many higher level scene understanding like human-human/scene interaction detection can be achieved. Our solution runs in real-time on commercial hardware.
  • Keywords
    approximation theory; estimation theory; feedforward neural nets; human-robot interaction; image classification; image colour analysis; learning (artificial intelligence); pose estimation; probability; regression analysis; robot vision; video surveillance; CNN-based model; approximate regression confidence estimation; convolutional neural network-based model; deep head pose; gaze-direction estimation; high-resolution human robot interaction; human gazing direction classification; human head pose estimation; human-human interaction detection; learned deep classifier; low resolution outdoor surveillance data; low-resolution multimodal RGB-D data; multimodal video; probabilistic model; scene interaction detection; visual attention model; Estimation; Head; Human computer interaction; Image resolution; Magnetic heads; Surveillance; Visualization; Convolutional neural networks (CNNs); RGB-D; deep learning; gaze direction; head-pose;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2015.2482819
  • Filename
    7279167