• DocumentCode
    3744847
  • Title

    High-performance Swahili keyword search with very limited language pack: The THUEE system for the OpenKWS15 evaluation

  • Author

    Meng Cai;Zhiqiang Lv;Cheng Lu;Jian Kang;Like Hui;Zhuo Zhang;Jia Liu

  • Author_Institution
    Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • fYear
    2015
  • Firstpage
    215
  • Lastpage
    222
  • Abstract
    This paper presents the Swahili keyword search system developed by the THUEE team for the OpenKWS15 evaluation, which is conducted by NIST under the IARPA Babel program. There are several highlights in the development of the system, including automatic generation of the pronunciation lexicon, aggressive data augmentation, the multilingual bottleneck feature extractor trained from 6 languages, text selection from web data for language model training, semi-supervised training for acoustic models and language models, out-of-vocabulary keyword detection using morphemes and a rich diversity of the systems for combination. A wide variety of acoustic modeling techniques are explored and compared. Up to 12 different individual systems are used for combination. The system achieves the state-of-the-art performance in the required condition of the evaluation.
  • Keywords
    "Training","Data models","Hidden Markov models","Keyword search","Acoustics","Tuning","Training data"
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
  • Type

    conf

  • DOI
    10.1109/ASRU.2015.7404797
  • Filename
    7404797