Title :
A data-driven phoneme mapping technique using interpolation vectors of phone-cluster adaptive training
Author :
Abraham, Basil ; Joy, Neethu Mariam ; Umesh, Navneeth K. S.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., Madras, Chennai, India
Abstract :
One of the major problems in acoustic modeling for a low-resource language is data sparsity. In recent years, cross-lingual acoustic modeling techniques have been employed to overcome this problem. In this paper we propose multiple cross-lingual techniques to address the problem of data insufficiency. The first method, which we call as the cross-lingual phone-CAT, uses the principles of phone-cluster adaptive training (phone-CAT), where the parameters of context-dependent states are obtained by linear interpolation of monophone cluster models. The second method uses the interpolation vectors of phone-CAT, which is known to capture the phonetic context information, to map phonemes between two languages. Finally, the data-driven phoneme-mapping technique is incorporated into the cross-lingual phone-CAT, to obtain what we call as the phoneme-mapped cross-lingual phone-CAT. The proposed techniques are employed in acoustic modeling of three Indian languages namely Bengali, Hindi and Tamil. The phoneme-mapped cross-lingual phone-CAT gave relative improvements of 15.14% for Bengali, 16.4% for Hindi and 11.3% for Tamil over the conventional cross-lingual subspace Gaussian mixture model (SGMM) in low-resource scenario.
Keywords :
computational linguistics; interpolation; natural language processing; speech recognition; Bengali language; Hindi language; Indian languages; Tamil language; acoustic modeling; context-dependent state parameters; data insufficiency problem; data sparsity; data-driven phoneme mapping technique; linear interpolation vectors; low-resource language; low-resource scenario; monophone cluster models; multiple cross-lingual techniques; phone-cluster adaptive training; phoneme-mapped cross-lingual phone-CAT; phonetic context information capture; Acoustics; Adaptation models; Data models; Hidden Markov models; Interpolation; Training; Vectors; cross-lingual; low-resource; phone-CAT; phoneme mapping;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2014 IEEE
DOI :
10.1109/SLT.2014.7078546