Title :
Multi-stream temporally varying weight regression for cross-lingual speech recognition
Author :
Shilin Liu ; Khe Chai Sim
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
Abstract :
Building a good Automatic Speech Recognition (ASR) system with limited resources is a very challenging task due to the existing many speech variations. Multilingual and cross-lingual speech recognition techniques are commonly used for this task. This paper investigates the recently proposed Temporally Varying Weight Regression (TVWR) method for cross-lingual speech recognition. TVWR uses posterior features to implicitly model the long-term temporal structures in acoustic patterns. By leveraging on the well-trained foreign recognizers, high quality monophone/state posteriors can be easily incorporated into TVWR to boost the ASR performance on low-resource languages. Furthermore, multi-stream TVWR is proposed, where multiple sets of posterior features are used to incorporate richer (temporal and spatial) context information. Finally, a separate state-tying for the TVWR regression parameters is used to better utilize the more reliable posterior features. Experimental results are evaluated for English and Malay speech recognition with limited resources. By using the Czech, Hungarian and Russian posterior features, TVWR was found to consistently outperform the tandem systems trained on the same features.
Keywords :
natural language processing; regression analysis; speech recognition; Czech posterior feature; English speech recognition; Hungarian posterior feature; Malay speech recognition; Russian posterior feature; acoustic pattern; automatic speech recognition; cross lingual speech recognition; long term temporal structure; monophone posterior; multistream speech recognition; spatial context information; state posterior; temporal context information; temporally varying weight regression; well trained foreign recognizer; Acoustics; Complexity theory; Context; Context modeling; Hidden Markov models; Speech; Speech recognition; context expansion; cross-lingual; decision tree clustering;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
DOI :
10.1109/ASRU.2013.6707769