Deep learning of split temporal context for automatic speech recognition

Author

Baccouche, Moez ; Besset, Benoit ; Collen, Patrice ; Le Blouch, Olivier

Author_Institution

Orange Labs. - France Telecom, Cesson-Sévigné, France

fYear

2014

fDate

4-9 May 2014

Firstpage

5422

Lastpage

5426

Abstract

This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.

Keywords

Gaussian processes; hidden Markov models; learning (artificial intelligence); speech recognition; TIMIT dataset; automatic speech recognition; deep learning procedure; deep neural architectures; split temporal context; standard hybrid GMM-HMM approach; Acoustics; Computer architecture; Context; Hidden Markov models; Speech; Speech recognition; Training; Speech recognition; deep learning; neural networks; split temporal context;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6854639

Filename

6854639