Title :
Factored language modeling for Russian LVCSR
Author :
Vazhenina, Daria ; Markov, Konstantin
Author_Institution :
Human Interface Lab., Univ. of Aizu, Aizu-wakamatsu, Japan
Abstract :
The Russian language is characterized by very flexible word order, which limits the ability of the standard n-grams to capture important regularities in the data. Moreover, Russian is highly inflectional language with rich morphology, which leads to high out-of-vocabulary word rates. Recently factored language model (FLM) was proposed with the aim of addressing the problems of morphologically rich languages. In this paper, we describe our implementation of the FLM for the Russian language automatic speech recognition (ASR). We investigated the effect of different factors, and propose a strategy to find the best factor set and back-off path. Evaluation experiments showed that FLM can decrease the perplexity as much as 20%. This allows to achieve 4.0% word error rate (WER) relative reduction, which further increases to 6.9% when FLM is interpolated with the conventional 3-gram LM.
Keywords :
natural language processing; speech recognition; ASR; FLM; Russian LVCSR; Russian language automatic speech recognition; WER; factored language modeling; morphologically rich languages; word error rate; Computational modeling; Context; Genetic algorithms; Interpolation; Speech; Speech recognition; Vocabulary; Russian language; factored language models; inflectional languages; language modeling;
Conference_Titel :
Awareness Science and Technology and Ubi-Media Computing (iCAST-UMEDIA), 2013 International Joint Conference on
Conference_Location :
Aizuwakamatsu
DOI :
10.1109/ICAwST.2013.6765434