DocumentCode :
117918
Title :
Denoising autoencoder and environment adaptation for distant-talking speech recognition with asynchronous speech recording
Author :
Longbiao Wang ; Bo Ren ; Ueda, Yuma ; Kai, Atsuhiko ; Teraoka, Shunta ; Fukushima, Taku
Author_Institution :
Nagaoka Univ. of Technol., Nagaoka, Japan
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
5
Abstract :
In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. This is implemented by combining denoising autoencoder-based cepstral-domain dereverberation, automatic asynchronous speech (microphone or mobile terminal) selection and environment adaptation. Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous mobile terminals. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.
Keywords :
cepstral analysis; speech coding; speech recognition; asynchronous mobile terminals; asynchronous speech recording; automatic asynchronous speech; cepstral-domain dereverberation; denoising autoencoder; distant-talking speech recognition; environment adaptation; far-field multiple mobile terminals; large vocabulary continuous speech recognition; word error rate; Hidden Markov models; Mobile communication; Noise reduction; Reverberation; Speech; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
Type :
conf
DOI :
10.1109/APSIPA.2014.7041548
Filename :
7041548
Link To Document :
بازگشت