Title :
Deep learning of split temporal context for automatic speech recognition
Author :
Baccouche, Moez ; Besset, Benoit ; Collen, Patrice ; Le Blouch, Olivier
Author_Institution :
Orange Labs. - France Telecom, Cesson-Sévigné, France
Abstract :
This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.
Keywords :
Gaussian processes; hidden Markov models; learning (artificial intelligence); speech recognition; TIMIT dataset; automatic speech recognition; deep learning procedure; deep neural architectures; split temporal context; standard hybrid GMM-HMM approach; Acoustics; Computer architecture; Context; Hidden Markov models; Speech; Speech recognition; Training; Speech recognition; deep learning; neural networks; split temporal context;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854639