DocumentCode :
672377
Title :
DNN acoustic modeling with modular multi-lingual feature extraction networks
Author :
Gehring, Jonas ; Quoc Bao Nguyen ; Metze, Florian ; Waibel, Alex
Author_Institution :
Interactive Syst. Lab., Karlsruhe Inst. of Technol., Karlsruhe, Germany
fYear :
2013
fDate :
8-12 Dec. 2013
Firstpage :
344
Lastpage :
349
Abstract :
In this work, we propose several deep neural network architectures that are able to leverage data from multiple languages. Modularity is achieved by training networks for extracting high-level features and for estimating phoneme state posteriors separately, and then combining them for decoding in a hybrid DNN/HMM setup. This approach has been shown to achieve superior performance for single-language systems, and here we demonstrate that feature extractors benefit significantly from being trained as multi-lingual networks with shared hidden representations. We also show that existing mono-lingual networks can be re-used in a modular fashion to achieve a similar level of performance without having to train new networks on multi-lingual data. Furthermore, we investigate in extending these architectures to make use of language-specific acoustic features. Evaluations are performed on a low-resource conversational telephone speech transcription task in Vietnamese, while additional data for acoustic model training is provided in Pashto, Tagalog, Turkish, and Cantonese. Improvements of up to 17.4% and 13.8% over mono-lingual GMMs and DNNs, respectively, are obtained.
Keywords :
feature extraction; natural language processing; neural nets; speech recognition; Cantonese languge; DNN acoustic modeling; Pashto languge; Tagalog languge; Turkish languge; Vietnamese languge; acoustic model training; deep neural network; language specific acoustic features; modular multilingual feature extraction network; monolingual network; multilingual network; phoneme state posterior; single language system; training network; Acoustics; Adaptation models; Data models; Feature extraction; Hidden Markov models; Neural networks; Training; Deep Neural Networks; Large-Vocabulary Speech Recognition; Low-Resource Acoustic Modeling; Multi-Lingual Acoustic Modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
Type :
conf
DOI :
10.1109/ASRU.2013.6707754
Filename :
6707754
Link To Document :
بازگشت