مرکز منطقه ای اطلاع رساني علوم و فناوري - Multi-stream temporally varying weight regression for cross-lingual speech recognition

DocumentCode :

672392

Title :

Multi-stream temporally varying weight regression for cross-lingual speech recognition

Author :

Shilin Liu ; Khe Chai Sim

Author_Institution :

Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

434

Lastpage :

439

Abstract :

Building a good Automatic Speech Recognition (ASR) system with limited resources is a very challenging task due to the existing many speech variations. Multilingual and cross-lingual speech recognition techniques are commonly used for this task. This paper investigates the recently proposed Temporally Varying Weight Regression (TVWR) method for cross-lingual speech recognition. TVWR uses posterior features to implicitly model the long-term temporal structures in acoustic patterns. By leveraging on the well-trained foreign recognizers, high quality monophone/state posteriors can be easily incorporated into TVWR to boost the ASR performance on low-resource languages. Furthermore, multi-stream TVWR is proposed, where multiple sets of posterior features are used to incorporate richer (temporal and spatial) context information. Finally, a separate state-tying for the TVWR regression parameters is used to better utilize the more reliable posterior features. Experimental results are evaluated for English and Malay speech recognition with limited resources. By using the Czech, Hungarian and Russian posterior features, TVWR was found to consistently outperform the tandem systems trained on the same features.

Keywords :

natural language processing; regression analysis; speech recognition; Czech posterior feature; English speech recognition; Hungarian posterior feature; Malay speech recognition; Russian posterior feature; acoustic pattern; automatic speech recognition; cross lingual speech recognition; long term temporal structure; monophone posterior; multistream speech recognition; spatial context information; state posterior; temporal context information; temporally varying weight regression; well trained foreign recognizer; Acoustics; Complexity theory; Context; Context modeling; Hidden Markov models; Speech; Speech recognition; context expansion; cross-lingual; decision tree clustering;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707769

Filename :

6707769

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672392