Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording

Author

Longbiao Wang ; Bo Ren ; Ueda, Yuma ; Kai, Atsuhiko ; Teraoka, Shunta ; Fukushima, Taku

Author_Institution

Nagaoka Univ. of Technol., Nagaoka, Japan

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

1

Lastpage

5

Abstract

In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.

Keywords

cepstral analysis; speech coding; speech recognition; asynchronous mobile terminals; asynchronous speech recording; automatic asynchronous speech; cepstral-domain dereverberation; denoising autoencoder; distant-talking speech recognition; environment adaptation; far-field multiple mobile terminals; large vocabulary continuous speech recognition; word error rate; Hidden Markov models; Mobile communication; Noise reduction; Reverberation; Speech; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location

Siem Reap

Type

conf

DOI

10.1109/APSIPA.2014.7041548

Filename

7041548