DocumentCode
3423219
Title
Improved phonotactic language identification using random forest language models
Author
Wang, XiaoRui ; Wang, ShiJin ; Liang, JiaEn ; Xu, Bo
Author_Institution
Digital Media Content Technol. Res. Center, Chinese Acad. of Sci., Beijing
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
4237
Lastpage
4240
Abstract
Recently a new language model, the random forest language model (RFLM), has been proposed and shown encouraging results in speech recognition tasks. In this paper we applied the RFLM to language identification tasks. We proposed a shared backoff smoothing to deal with data sparseness problem. Experiments were conducted on a subset of NIST 2003 language recognition evaluation data. The RFLM obtained 15.7% relative error rate reduction comparing with the standard trigram LM. The RFLM can be used as a counterpart to n-gram LM and BTLM for system fusion. We also empirically studied the relation between system performance and the tree numbers in a RFLM.
Keywords
decision trees; natural language processing; speech recognition; decision tree; n-gram LM; phonotactic language identification; random forest language models; shared backoff smoothing; sparseness problem; speech recognition; system fusion; Automation; Decision trees; Decoding; History; NIST; Natural languages; Pattern recognition; Random media; Smoothing methods; Speech recognition; Decision Tree Language Models; Language Identification; Random Forest Language Models;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV
ISSN
1520-6149
Print_ISBN
978-1-4244-1483-3
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2008.4518590
Filename
4518590
Link To Document