مرکز منطقه ای اطلاع رساني علوم و فناوري - Combination of data borrowing strategies for low-resource LVCSR

DocumentCode :

672387

Title :

Combination of data borrowing strategies for low-resource LVCSR

Author :

Yanmin Qian ; Kai Yu ; Jia Liu

Author_Institution :

Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

404

Lastpage :

409

Abstract :

Large vocabulary continuous speech recognition (LVCSR) is particularly difficult for low-resource languages, where only very limited manually transcribed data are available. However, it is often feasible to obtain large amount of untranscribed data of the low-resource target language or sufficient transcribed data of some non-target languages. Borrowing data from these additional sources to help LVCSR for low-resource language becomes an important research direction. This paper presents an integrated data borrowing framework in this scenario. Three data borrowing approaches were first investigated in detail, including feature, model and data corpus. They borrow data at different levels from additional sources, and all get substantial performance improvements. As these strategies work independently, the obtained gains are likely additive. The three strategies are then combined to form an integrated data borrowing framework. Experiments showed that with the integrated data borrowing framework, significant improvement of more than 10% absolute WER reduction over a conventional baseline was obtained. In particular, the gain under the extreme limited low-resource scenario is 16%.

Keywords :

speech recognition; vocabulary; LVCSR; data borrowing strategies; data corpus; integrated data borrowing framework; large vocabulary continuous speech recognition; low-resource languages; manually transcribed data; untranscribed data; Data models; Detectors; Feature extraction; Hidden Markov models; Speech; Speech recognition; Training; Articulatory feature; Data borrowing; Low resource speech recognition; Subspace Gaussian mixture models; Unsupervised training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707764

Filename :

6707764

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672387