Improving data selection for low-resource STT and KWS

Author

Thiago Fraga-Silva;Antoine Laurent;Jean-Luc Gauvain;Lori Lamel;Viet-Bac Le;Abdel Messaoudi

Author_Institution

Vocapia Research, 28 rue Jean Rostand, 91400 Orsay, France

fYear

2015

Firstpage

153

Lastpage

159

Abstract

This paper extends recent research on training data selection for speech transcription and keyword spotting system development. Selection techniques were explored in the context of the IARPA-Babel Active Learning (AL) task for 6 languages. Different selection criteria were considered with the goal of improving over a system built using a pre-defined 3-hour training data set. Four variants of the entropy-based criterion were explored: words, triphones, phones as well as the use of HMM-states previously introduced in [4]. The influence of the number of HMM-states was assessed as well as whether automatic or manual reference transcripts were used. The combination of selection criteria was investigated, and a novel multi-stage selection method proposed. This method was also assessed using larger data sets than were permitted in the Babel AL task. Results are reported for the 6 languages. The multi-stage selection was also applied to the surprise language (Swahili) in the NIST OpenKWS 2015 evaluation.

Keywords

"Speech","Hidden Markov models","Acoustics","Entropy","Training","Decoding","Training data"

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on

Type

conf

DOI

10.1109/ASRU.2015.7404788

Filename

7404788