Investigation of multilingual deep neural networks for spoken term detection

Author

Knill, K.M. ; Gales, Mark J.F. ; Rath, Satish Prasad ; Woodland, Philip C. ; Zhang, Chenghui ; Zhang, S.-X.

Author_Institution

Dept. of Eng., Univ. of Cambridge, Cambridge, UK

fYear

2013

fDate

8-12 Dec. 2013

Firstpage

138

Lastpage

143

Abstract

The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.

Keywords

Gaussian processes; decision trees; hidden Markov models; mixture models; natural language processing; neural nets; speech recognition; speech synthesis; IARPA Babel limited language pack corpora; KWS; STT systems; Tandem configuration; high-performance speech processing systems; hybrid systems; initial multilingual system development; keyword search; language independent acoustic model test; low-resource languages; multilingual bottleneck features; multilingual deep neural networks; speech-to-text systems; spoken term detection; tandem GMM-HMM decision trees; training set languages; Acoustics; Decision trees; Hidden Markov models; Speech; Speech recognition; Training; Training data; Multilingual; keyword search; neural networks; speech recognition; spoken term detection;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location

Olomouc

Type

conf

DOI

10.1109/ASRU.2013.6707719

Filename

6707719