DocumentCode :
3433085
Title :
Low-resource keyword search strategies for tamil
Author :
Chen, Nancy F. ; Chongjia Ni ; Chen, Nancy F. ; Sivadas, Sunil ; Van Tung Pham ; Haihua Xu ; Xiong Xiao ; Tze Siong Lau ; Su Jun Leow ; Boon Pang Lim ; Cheung-Chi Leung ; Lei Wang ; Chin-Hui Lee ; Goh, Alvina ; Eng Siong Chng ; Bin Ma ; Haizhou Li
Author_Institution :
Inst. for Infocomm Res., A*STAR, Singapore, Singapore
fYear :
2015
fDate :
19-24 April 2015
Firstpage :
5366
Lastpage :
5370
Abstract :
We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware language modeling; (3) Subword modeling of morphemes and homophones.
Keywords :
Gaussian processes; linguistics; optimisation; speech recognition; Gaussian component indexed N-gram; IARPA Babel program; NIST Open Keyword Search Evaluation; conversational Tamil; homophone subword model; keyword aware language model; low-resource keyword search strategy; morpheme subword model; speech recognition; state-of-the-art KWS system; submodular optimization data selection; Acoustics; Data models; Keyword search; Optimization; Speech; Speech recognition; Training; Spoken term detection (STD); active learning; agglutinative languages; deep neural network (DNN); inflective languages; keyword spotting; morphology; semi-supervised learning; under-resourced languages; unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
Type :
conf
DOI :
10.1109/ICASSP.2015.7178996
Filename :
7178996
Link To Document :
بازگشت