DocumentCode
179551
Title
Deep learning of split temporal context for automatic speech recognition
Author
Baccouche, Moez ; Besset, Benoit ; Collen, Patrice ; Le Blouch, Olivier
Author_Institution
Orange Labs. - France Telecom, Cesson-Sévigné, France
fYear
2014
fDate
4-9 May 2014
Firstpage
5422
Lastpage
5426
Abstract
This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.
Keywords
Gaussian processes; hidden Markov models; learning (artificial intelligence); speech recognition; TIMIT dataset; automatic speech recognition; deep learning procedure; deep neural architectures; split temporal context; standard hybrid GMM-HMM approach; Acoustics; Computer architecture; Context; Hidden Markov models; Speech; Speech recognition; Training; Speech recognition; deep learning; neural networks; split temporal context;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6854639
Filename
6854639
Link To Document