DocumentCode :
3630612
Title :
Morphological random forests for language modeling of inflectional languages
Author :
Ilya Oparin;Ondrej Glembek;Lukas Burget;Jan Cernocky
Author_Institution :
Dept. of Computer Science and Engineering, University of West Bohemia, Plzen, Czech Republic
fYear :
2008
Firstpage :
189
Lastpage :
192
Abstract :
In this paper, we are concerned with using decision trees (DT) and random forests (RF) in language modeling for Czech LVCSR. We show that the RF approach can be successfully implemented for language modeling of an inflectional language. Performance of word-based and morphological DTs and RFs was evaluated on lecture recognition task. We show that while DTs perform worse than conventional trigram language models (LM), RFs of both kind outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction). In this paper we also investigate distribution of morphological feature types chosen for splitting data at different levels of DTs.
Keywords :
"Decision trees","Radio frequency","Natural languages","History","Training data","Computer science","Greedy algorithms","Interpolation","Speech recognition"
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Print_ISBN :
978-1-4244-3471-8
Type :
conf
DOI :
10.1109/SLT.2008.4777872
Filename :
4777872
Link To Document :
بازگشت