مرکز منطقه ای اطلاع رساني علوم و فناوري - Multilingual representations for low resource speech recognition and keyword search

DocumentCode :

3744853

Title :

Multilingual representations for low resource speech recognition and keyword search

Author :

Jia Cui;Brian Kingsbury;Bhuvana Ramabhadran;Abhinav Sethy;Kartik Audhkhasi;Xiaodong Cui;Ellen Kislal;Lidia Mangu;Markus Nussbaum-Thom;Michael Picheny;Zoltan T?ske;Pavel Golik;Ralf Schl?ter;Hermann Ney;Mark J. F. Gales;Kate M. Knill;Anton Ragni;Haipeng Wan

Author_Institution :

IBM Watson, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, U.S.A.

fYear :

2015

Firstpage :

259

Lastpage :

266

Abstract :

This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data. Multilingual acoustic representations proved to be crucial for building these systems under strict time constraints. The paper discusses several key insights on how these representations are derived and used. First, we present a data sampling strategy that can speed up the training of multilingual representations without appreciable loss in ASR performance. Second, we show that fusion of diverse multilingual representations developed at different LORELEI sites yields substantial ASR and KWS gains. Speaker adaptation and data augmentation of these representations improves both ASR and KWS performance (up to 8.7% relative). Third, incorporating un-transcribed data through semi-supervised learning, improves WER and KWS performance. Finally, we show that these multilingual representations significantly improve ASR and KWS performance (relative 9% for WER and 5% for MTWV) even when forty hours of transcribed audio in the target language is available. Multilingual representations significantly contributed to the LORELEI KWS systems winning the OpenKWS15 evaluation.

Keywords :

"Training","Training data","Keyword search","Context","Data models","Acoustics","Neural networks"

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type :

conf

DOI :

10.1109/ASRU.2015.7404803

Filename :

7404803

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3744853