DocumentCode
3008405
Title
Improving real-time voice activity detection for perceptual robotic control in noisy environment
Author
Shih, Po-Yi ; Lin, Po-Chuan ; Wang, Ihing-Fa ; Chen, You-Zen
Author_Institution
Dept. of Electr. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
fYear
2011
fDate
21-24 Nov. 2011
Firstpage
1040
Lastpage
1044
Abstract
In order to enable natural human-robot interaction, robots will have to acquire the skills to detect and meaningfully integrate information from multiple modalities. In this paper, we focus on efficient voice activity detection (VAD) in the context of a multi-sensory robot device that uses audio activity information to drive perceptual function to yield natural and intuitive response to human behavior for robotic control. We proposed an improving VAD algorithm using frequency analysis, including harmonic spectral peaks (HSP), spectral flatness (SF) and spectral entropy (SE). The proposed method was compared the performance and advantages with five common VAD algorithms in three noise environments. We also described the way to integrate in our robot, Robert, and briefly introduced the perceptual multimodal framework for robotic control, where visual-audio modalities can be easily integrated such as speech content recognition and visual object identification or tracking.
Keywords
control engineering computing; human-robot interaction; mobile robots; object tracking; sensor fusion; speech recognition; Robert robot; VAD algorithm; audio activity information; frequency analysis; harmonic spectral peaks; human-robot interaction; multisensory robot device; noisy environment; perceptual function; perceptual robotic control; spectral entropy; spectral flatness; speech content recognition; visual object identification; visual object tracking; visual-audio modalities; voice activity detection; Face recognition; Noise measurement; Robots; Signal to noise ratio; Speech; TV; frequency analysis; human-robot interaction; multiple modalities; robot control; spectral entropy (SE); spectral flatness (SF); spectral peaks (HSP); voice activity detection (VAD);
fLanguage
English
Publisher
ieee
Conference_Titel
TENCON 2011 - 2011 IEEE Region 10 Conference
Conference_Location
Bali
ISSN
2159-3442
Print_ISBN
978-1-4577-0256-3
Type
conf
DOI
10.1109/TENCON.2011.6129269
Filename
6129269
Link To Document