DocumentCode :
1798846
Title :
Acoustic modeling for hindi speech recognition in low-resource settings
Author :
Dey, Anamika ; Weibin Zhang ; Fung, Pascale
Author_Institution :
Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
fYear :
2014
fDate :
7-9 July 2014
Firstpage :
891
Lastpage :
894
Abstract :
We propose an approach for acoustic modeling of Hindi speech by borrowing from English data, for the purpose of Hindi LVCSR. Hindi, like many Indian languages, has a significant speaker base but there have not been a lot of resources to obtain large amounts of transcribed Hindi data for LVCSR. We compare a baseline Gaussian model-sharing approach with DNN training. A widely used data-borrowing method with DNN is to firstly train a DNN with English, for which a large amount of training data is available; then the whole DNN, except the last layer, is fine-tuned by using the target Hindi data. We propose to do phonetic mapping between Hindi and English in the first stage, training Hindi acoustic models by sharing data between Hindi-English phone pairs in the second stage, and finally fine-tuning the acoustic model by using the Hindi data. We evaluate and compare these approaches with experiments using 1 hour of transcribed Hindi data and 15 hours of Wall Street Journal English data. Experiments show that the proposed method significantly outperforms conventional baseline models in a low-resource setting for phone recognition tasks.
Keywords :
Gaussian processes; acoustic signal processing; feedforward neural nets; hidden Markov models; learning (artificial intelligence); natural language processing; speaker recognition; speech processing; DNN training; GMM; Gaussian mixture models; HMM; Hindi LVCSR; Hindi speech recognition; Hindi-English phone pairs; Indian languages; Wall Street Journal English data; acoustic modeling; baseline Gaussian model-sharing approach; data sharing; deep neural network; feed-forward network; hidden Markov models; low-resource settings; phone recognition tasks; phonetic mapping; Acoustics; Data models; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Hindi LVSCR; data borrowing; low resource; phone mapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Audio, Language and Image Processing (ICALIP), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-3902-2
Type :
conf
DOI :
10.1109/ICALIP.2014.7009923
Filename :
7009923
Link To Document :
بازگشت