Title :
Super-Audible Voice Activity Detection
Author :
McLoughlin, Ian Vince
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
In this paper, reflected sound of frequency just above the audible range is used to detect speech activity. The active signal used is inaudible to humans, readily generated by the typical audio circuitry and components found in mobile telephones, and is robust to background sounds such as nearby voices. In use, the system relies upon a wideband excitation signal emitted from a loudspeaker located near the lips, which reflects from the mouth region and is then captured by a nearby microphone. The state of the lip opening is evaluated periodically by tracking the resonance patterns in the reflected excitation signal. When the lips are open, deep and complex resonances are formed as energy propagates into and then reflects out from the open mouth and vocal tract, with resonance depth being related to the open lip area. When the lips are closed, these resonance patterns are absent. The presence of the resonances can thus serve as a low complexity detection measure. The technique is evaluated for multiple users in terms of sensitivity to source placement and sensor placement. Voice activity detection performance using this measure is further evaluated in the presence of realistic wideband acoustic background noise, as well as artificially added noise. The system is shown to be relatively insensitive to sensor placement, highly insensitive to background noise, and able to achieve greater than 90% voice activity detection accuracy. The technique is even suitable when a subject is whispering in the presence of much louder multi-speaker babble. The technique has potential for speech-based systems operating in high noise environments as well as in silent speech interfaces, whisper-input systems and voice prostheses for speech-impaired users.
Keywords :
acoustic noise; acoustic signal detection; audio signal processing; loudspeakers; prosthetics; sensor placement; smart phones; speech synthesis; target tracking; artificial added noise; audio circuitry; audio component; lip state detection; louder multispeaker babble; loudspeaker; mobile telephone; mouth state detection; reflected excitation signal; resonance pattern tracking; sensor placement; silent speech interface; source placement; speech impaired user; speech-based systems; super-audible voice activity detection; vocal tract; voice prostheses; whisper input system; wideband acoustic background noise; wideband excitation signal; Chirp; Doppler effect; Lips; Mouth; Resonant frequency; Speech; Speech processing; Lip state detection; mouth state detection; silent speech interfaces; speech activity detection; voice activity detection; voice operated switch;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2335055