Title :
High-performance Swahili keyword search with very limited language pack: The THUEE system for the OpenKWS15 evaluation
Author :
Meng Cai;Zhiqiang Lv;Cheng Lu;Jian Kang;Like Hui;Zhuo Zhang;Jia Liu
Author_Institution :
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Abstract :
This paper presents the Swahili keyword search system developed by the THUEE team for the OpenKWS15 evaluation, which is conducted by NIST under the IARPA Babel program. There are several highlights in the development of the system, including automatic generation of the pronunciation lexicon, aggressive data augmentation, the multilingual bottleneck feature extractor trained from 6 languages, text selection from web data for language model training, semi-supervised training for acoustic models and language models, out-of-vocabulary keyword detection using morphemes and a rich diversity of the systems for combination. A wide variety of acoustic modeling techniques are explored and compared. Up to 12 different individual systems are used for combination. The system achieves the state-of-the-art performance in the required condition of the evaluation.
Keywords :
"Training","Data models","Hidden Markov models","Keyword search","Acoustics","Tuning","Training data"
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
DOI :
10.1109/ASRU.2015.7404797