مرکز منطقه ای اطلاع رساني علوم و فناوري - Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

DocumentCode :

179595

Title :

Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

Author :

Dongpeng Chen ; Mak, Brian ; Cheung-Chi Leung ; Sivadas, Sunil

Author_Institution :

Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

5592

Lastpage :

5596

Abstract :

It is well-known in machine learning that multitask learning (MTL) can help improve the generalization performance of singly learning tasks if the tasks being trained in parallel are related, especially when the amount of training data is relatively small. In this paper, we investigate the estimation of triphone acoustic models in parallel with the estimation of trigrapheme acoustic models under the MTL framework using deep neural network (DNN). As triphone modeling and trigrapheme modeling are highly related learning tasks, a better shared internal representation (the hidden layers) can be learned to improve their generalization performance. Experimental evaluation on three low-resource South African languages shows that triphone DNNs trained by the MTL approach perform significantly better than triphone DNNs that are trained by the single-task learning (STL) approach by ~3-13%. The MTL-DNN triphone models also outperform the ROVER result that combines a triphone STL-DNN and a trigrapheme STL-DNN.

Keywords :

learning (artificial intelligence); natural language processing; neural nets; speech recognition; MTL-DNN triphone models; South African languages; deep neural networks; joint acoustic modeling; machine learning; multitask learning; single-task learning; speech recognition; trigrapheme STL-DNN; trigrapheme acoustic models; triphone STL-DNN; triphone acoustic models; Acoustics; Joints; Neural networks; Speech; Speech processing; Training; Training data; deep neural networks; multitask learning; trigrapheme modeling; triphone modeling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854673

Filename :

6854673

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=179595