Title :
Low-resource keyword search strategies for tamil
Author :
Chen, Nancy F. ; Chongjia Ni ; Chen, Nancy F. ; Sivadas, Sunil ; Van Tung Pham ; Haihua Xu ; Xiong Xiao ; Tze Siong Lau ; Su Jun Leow ; Boon Pang Lim ; Cheung-Chi Leung ; Lei Wang ; Chin-Hui Lee ; Goh, Alvina ; Eng Siong Chng ; Bin Ma ; Haizhou Li
Author_Institution :
Inst. for Infocomm Res., A*STAR, Singapore, Singapore
Abstract :
We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.
Keywords :
Gaussian processes; linguistics; optimisation; speech recognition; Gaussian component indexed N-gram; IARPA Babel program; NIST Open Keyword Search Evaluation; conversational Tamil; homophone subword model; keyword aware language model; low-resource keyword search strategy; morpheme subword model; speech recognition; state-of-the-art KWS system; submodular optimization data selection; Acoustics; Data models; Keyword search; Optimization; Speech; Speech recognition; Training; Spoken term detection (STD); active learning; agglutinative languages; deep neural network (DNN); inflective languages; keyword spotting; morphology; semi-supervised learning; under-resourced languages; unsupervised learning;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178996