• DocumentCode
    3008405
  • Title

    Improving real-time voice activity detection for perceptual robotic control in noisy environment

  • Author

    Shih, Po-Yi ; Lin, Po-Chuan ; Wang, Ihing-Fa ; Chen, You-Zen

  • Author_Institution
    Dept. of Electr. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • fYear
    2011
  • fDate
    21-24 Nov. 2011
  • Firstpage
    1040
  • Lastpage
    1044
  • Abstract
    In order to enable natural human-robot interaction, robots will have to acquire the skills to detect and meaningfully integrate information from multiple modalities. In this paper, we focus on efficient voice activity detection (VAD) in the context of a multi-sensory robot device that uses audio activity information to drive perceptual function to yield natural and intuitive response to human behavior for robotic control. We proposed an improving VAD algorithm using frequency analysis, including harmonic spectral peaks (HSP), spectral flatness (SF) and spectral entropy (SE). The proposed method was compared the performance and advantages with five common VAD algorithms in three noise environments. We also described the way to integrate in our robot, Robert, and briefly introduced the perceptual multimodal framework for robotic control, where visual-audio modalities can be easily integrated such as speech content recognition and visual object identification or tracking.
  • Keywords
    control engineering computing; human-robot interaction; mobile robots; object tracking; sensor fusion; speech recognition; Robert robot; VAD algorithm; audio activity information; frequency analysis; harmonic spectral peaks; human-robot interaction; multisensory robot device; noisy environment; perceptual function; perceptual robotic control; spectral entropy; spectral flatness; speech content recognition; visual object identification; visual object tracking; visual-audio modalities; voice activity detection; Face recognition; Noise measurement; Robots; Signal to noise ratio; Speech; TV; frequency analysis; human-robot interaction; multiple modalities; robot control; spectral entropy (SE); spectral flatness (SF); spectral peaks (HSP); voice activity detection (VAD);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2011 - 2011 IEEE Region 10 Conference
  • Conference_Location
    Bali
  • ISSN
    2159-3442
  • Print_ISBN
    978-1-4577-0256-3
  • Type

    conf

  • DOI
    10.1109/TENCON.2011.6129269
  • Filename
    6129269