DocumentCode :
3423219
Title :
Improved phonotactic language identification using random forest language models
Author :
Wang, XiaoRui ; Wang, ShiJin ; Liang, JiaEn ; Xu, Bo
Author_Institution :
Digital Media Content Technol. Res. Center, Chinese Acad. of Sci., Beijing
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
4237
Lastpage :
4240
Abstract :
Recently a new language model, the random forest language model (RFLM), has been proposed and shown encouraging results in speech recognition tasks. In this paper we applied the RFLM to language identification tasks. We proposed a shared backoff smoothing to deal with data sparseness problem. Experiments were conducted on a subset of NIST 2003 language recognition evaluation data. The RFLM obtained 15.7% relative error rate reduction comparing with the standard trigram LM. The RFLM can be used as a counterpart to n-gram LM and BTLM for system fusion. We also empirically studied the relation between system performance and the tree numbers in a RFLM.
Keywords :
decision trees; natural language processing; speech recognition; decision tree; n-gram LM; phonotactic language identification; random forest language models; shared backoff smoothing; sparseness problem; speech recognition; system fusion; Automation; Decision trees; Decoding; History; NIST; Natural languages; Pattern recognition; Random media; Smoothing methods; Speech recognition; Decision Tree Language Models; Language Identification; Random Forest Language Models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518590
Filename :
4518590
Link To Document :
بازگشت