DocumentCode
3752152
Title
Improving bottleneck features for automatic speech recognition using gammatone-based cochleagram and sparsity regularization
Author
Chao Ma;Jun Qi;Dongmei Li;Runsheng Liu
Author_Institution
Department of Electronic Engineering, Tsinghua University, Beijing, China, 100084
fYear
2015
Firstpage
63
Lastpage
67
Abstract
Bottleneck (BN) features, particularly based on deep structures of a neural network, have been successfully applied to Automatic Speech Recognition (ASR) tasks. This paper goes on the study of improving the BN features for ASR tasks by employing two different methods: (1) a Cochleagram generated by Gammatone filters as the input feature for a deep neural network; (2) imposing the sparsity regularization on the bottleneck layer to control the sparsity level of BN features by constraining the activations of the hidden units to be averagely inactive most of the time. Our experiments on the Wall Street Journal (WSJ) database demonstrate that the two approaches can deliver certain performance gains to BN features for ASR tasks. In addition, further experiments on the WSJ database from different noise levels show that the Cochleagram as input has better noise-robust performance than the commonly used Mel-scaled filterbank.
Keywords
"Neural networks","Frequency modulation","Indexes","Cost function","Automatic speech recognition","Mel frequency cepstral coefficient"
Publisher
ieee
Conference_Titel
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific
Type
conf
DOI
10.1109/APSIPA.2015.7415401
Filename
7415401
Link To Document