DocumentCode :
3744847
Title :
High-performance Swahili keyword search with very limited language pack: The THUEE system for the OpenKWS15 evaluation
Author :
Meng Cai;Zhiqiang Lv;Cheng Lu;Jian Kang;Like Hui;Zhuo Zhang;Jia Liu
Author_Institution :
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
fYear :
2015
Firstpage :
215
Lastpage :
222
Abstract :
This paper presents the Swahili keyword search system developed by the THUEE team for the OpenKWS15 evaluation, which is conducted by NIST under the IARPA Babel program. There are several highlights in the development of the system, including automatic generation of the pronunciation lexicon, aggressive data augmentation, the multilingual bottleneck feature extractor trained from 6 languages, text selection from web data for language model training, semi-supervised training for acoustic models and language models, out-of-vocabulary keyword detection using morphemes and a rich diversity of the systems for combination. A wide variety of acoustic modeling techniques are explored and compared. Up to 12 different individual systems are used for combination. The system achieves the state-of-the-art performance in the required condition of the evaluation.
Keywords :
"Training","Data models","Hidden Markov models","Keyword search","Acoustics","Tuning","Training data"
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
Type :
conf
DOI :
10.1109/ASRU.2015.7404797
Filename :
7404797
Link To Document :
بازگشت