• DocumentCode
    60051
  • Title

    Multiparty Interaction Understanding Using Smart Multimodal Digital Signage

  • Author

    Tung, Tony ; Gomez, Raquel ; Kawahara, Toshio ; Matsuyama, Takashi

  • Author_Institution
    Acad. Center for Media Studies & the Grad. Sch. of Inf., Kyoto Univ., Kyoto, Japan
  • Volume
    44
  • Issue
    5
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    625
  • Lastpage
    637
  • Abstract
    This paper presents a novel multimodal system designed for multi-party human-human interaction analysis. The design of human-machine interfaces for multiple users is challenging because simultaneous processing of actions and reactions have to be consistent. The proposed system consists of a large display equipped with multiple sensing devices: microphone array, HD video cameras, and depth sensors. Multiple users positioned in front of the panel freely interact using voice or gesture while looking at the displayed content, without wearing any particular devices (such as motion capture sensors or head mounted devices). Acoustic and visual information is captured and processed jointly using established and state-of-the-art techniques to obtain individual speech and gaze direction. Furthermore, a new framework is proposed to model A/V multimodal interaction between verbal and nonverbal communication events. Dynamics of audio signals obtained from speaker diarization and head poses extracted from video images are modeled using hybrid dynamical systems (HDS). We show that HDS temporal structure characteristics can be used for multimodal interaction level estimation, which is useful feedback that can help to improve multi-party communication experience. Experimental results using synthetic and real-world datasets of group communication such as poster presentations show the feasibility of the proposed multimodal system.
  • Keywords
    audio signal processing; audio-visual systems; human computer interaction; microphone arrays; ubiquitous computing; HD video camera; HDS temporal structure; acoustic information; audio signal; depth sensor; human-human interaction analysis; human-machine interface; hybrid dynamical system; microphone array; multimodal interaction level estimation; multiparty interaction; smart multimodal digital signage; speaker diarization; visual information; Arrays; Cameras; High definition video; Microphones; Sensors; Speech; Speech processing; Human--machine system; Human??machine system; multimodal interaction dynamics; multiparty interaction; smart digital signage;
  • fLanguage
    English
  • Journal_Title
    Human-Machine Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2291
  • Type

    jour

  • DOI
    10.1109/THMS.2014.2326873
  • Filename
    6839020