Multi-sensory speech processing: incorporating automatically extracted hidden dynamic information

Author

Subramanya, Amarnag ; Deng, Li ; Liu, Zicheng ; Zhang, Zhengyou

Author_Institution

SSLI Lab, Washington Univ., Seattle, WA, USA

fYear

2005

fDate

6-8 July 2005

Abstract

We describe a novel technique for multi-sensory speech processing for enhancing noisy speech and for improved noise-robust speech recognition. Both air- and bone-conductive microphones are used to capture speech data where the bone sensor contains virtually noise-free hidden dynamic information of clean speech in the form of formant trajectories. The distortion in the bone-sensor signal such as teeth-clacking and noise leakage can be effectively removed by making use of the automatically extracted formant information from the bone-sensor signal. This paper reports an improved technique for synthesizing speech waveforms based on the LPC cepstra computed analytically from the formant trajectories. When this new signal stream is fused with the other available speech data streams, we achieved improved performance for noisy speech recognition.

Keywords

cepstral analysis; feature extraction; microphones; sensor fusion; signal denoising; speech enhancement; speech recognition; speech synthesis; LPC cepstra; air-conductive microphone; automatic hidden information extraction; bone sensor signal distortion; bone-conductive microphone; data stream; formant trajectory; multisensory speech processing; noisy speech enhancement; speech data capturing; speech recognition; speech waveform synthesis; virtual noise-free dynamic information; Bones; Data mining; Distortion; Microphones; Noise robustness; Speech analysis; Speech enhancement; Speech processing; Speech recognition; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on

Print_ISBN

0-7803-9331-7

Type

conf

DOI

10.1109/ICME.2005.1521611

Filename

1521611