Title :
Improved phonotactic language identification using random forest language models
Author :
Wang, XiaoRui ; Wang, ShiJin ; Liang, JiaEn ; Xu, Bo
Author_Institution :
Digital Media Content Technol. Res. Center, Chinese Acad. of Sci., Beijing
fDate :
March 31 2008-April 4 2008
Abstract :
Recently a new language model, the random forest language model (RFLM), has been proposed and shown encouraging results in speech recognition tasks. In this paper we applied the RFLM to language identification tasks. We proposed a shared backoff smoothing to deal with data sparseness problem. Experiments were conducted on a subset of NIST 2003 language recognition evaluation data. The RFLM obtained 15.7% relative error rate reduction comparing with the standard trigram LM. The RFLM can be used as a counterpart to n-gram LM and BTLM for system fusion. We also empirically studied the relation between system performance and the tree numbers in a RFLM.
Keywords :
decision trees; natural language processing; speech recognition; decision tree; n-gram LM; phonotactic language identification; random forest language models; shared backoff smoothing; sparseness problem; speech recognition; system fusion; Automation; Decision trees; Decoding; History; NIST; Natural languages; Pattern recognition; Random media; Smoothing methods; Speech recognition; Decision Tree Language Models; Language Identification; Random Forest Language Models;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518590