DocumentCode :
3744880
Title :
Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results
Author :
Lukas Pfeifenberger;Tobias Schrank;Matthias Zohrer;Martin Hagm?ller;Franz Pernkopf
Author_Institution :
Signal Processing and Speech Communication Laboratory, Graz University of Technology, Graz, Austria
fYear :
2015
Firstpage :
452
Lastpage :
459
Abstract :
Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively.
Keywords :
"Speech","Speech recognition","Microphones","Artificial neural networks","Speech enhancement","Array signal processing"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404830
Filename :
7404830
Link To Document :
بازگشت