DocumentCode
3744889
Title
Single and multi-channel approaches for distant speech recognition under noisy reverberant conditions: I2R´S system description for the ASpIRE challenge
Author
Jonathan Dennis;Tran Huy Dat
Author_Institution
Institute for Infocomm Research, A?STAR, 1 Fusionopolis Way, Singapore 138632
fYear
2015
Firstpage
518
Lastpage
524
Abstract
In this paper, we introduce the system developed at the Institute for Infocomm Research (I2 R) for the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. The main components of the system are a front-end processing system consisting of a distributed beam-forming algorithm, that performs adaptive weighting and channel elimination, a speech dereverberation approach using a maximum-kurtosis criteria, and a robust voice activity detection (VAD) module based on using the sub-harmonic ratio (SHR). The acoustic back-end consists of a multi-conditional Deep Neural Network (DNN) model that uses speaker adapted features combined with a decoding strategy that performs semi-supervised DNN model adaptation using weighted labels generated by the first-pass decoding output. On the single-microphone evaluation, our system achieved a word error rate (WER) of 44.8%. With the incorporation of beamforming on the multi-microphone evaluation, our system achieved an improvement in WER of over 6% to give the best evaluation result of 38.5%.
Keywords
"Speech","Training","Speech recognition","Acoustics","Adaptation models","Robustness","Testing"
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type
conf
DOI
10.1109/ASRU.2015.7404839
Filename
7404839
Link To Document