Title :
Multi-lingual speech recognition with low-rank multi-task deep neural networks
Author :
Mohan, Aanchan ; Rose, Richard
Author_Institution :
Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
Abstract :
Multi-task learning (MTL) for deep neural network (DNN) multilingual acoustic models has been shown to be effective for learning parameters that are common or shared between multiple languages[1, 2]. In the MTL paradigm, the number of parameters in the output layer is large and scales with the number of languages used in training. This output layer becomes a computational bottleneck. For mono-lingual DNNs, low-rank matrix factorization (LRMF) of weight matrices have yielded large computational savings[3, 4]. The LRMF proposed in this work for MTL, is for the original languagespecific block matrices to “share” a common matrix, with resulting low-rank language specific block matrices. The impact of LRMF is presented in two scenarios, namely : (a) improving performance in a target language when auxiliary languages are included during multi-lingual training; and (b) cross-language transfer to an unseen language with only 1 hour of transcribed training data. A 44% parameter reduction in the final layer, manifests itself in providing a lower memory footprint and faster training times. An experimental study shows that the LRMF multi-lingual DNN provides competitive performance compared to a full-rank multi-lingual DNN in both scenarios.
Keywords :
neural nets; speech recognition; DNN multi-lingual acoustic models; LRMF impact; LRMF multi-lingual DNN; MTL paradigm; auxiliary languages; computational bottleneck; cross-language transfer; deep neural network multi-lingual acoustic models; full-rank multi-lingual DNN; learning parameters; low-rank language specific block matrices; low-rank matrix factorization; low-rank multi-task deep neural networks; lower memory footprint; mono-lingual DNN; multi-lingual speech recognition; multi-task learning; original language-specific block matrices; output layer; parameter reduction; training times; transcribed training data; unseen language; weight matrices; Acoustics; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Low-resource speech recognition; Multi-lingual speech recognition; Multitask Learning; Neural Networks for speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178921