مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving Mandarin Chinese STT system with Random Forests language models

DocumentCode :

2018220

Title :

Improving Mandarin Chinese STT system with Random Forests language models

Author :

Oparin, Ilya ; Lamel, Lori ; Gauvain, Jean-Luc

Author_Institution :

LIMSI, CNRS, Orsay, France

fYear :

2010

fDate :

Nov. 29 2010-Dec. 3 2010

Firstpage :

242

Lastpage :

245

Abstract :

The goal of this work is to assess the capacity of random forest language models estimated on a very large text corpus to improve the performance of an STT system. Previous experiments with random forests were mainly concerned with small or medium size data tasks. In this work the development version of the 2009 LIMSI Mandarin Chinese STT system was chosen as a challenging baseline to improve upon. This system is characterized by a language model trained on a very large text corpus (over 3.2 billion segmented words) making the baseline 4-gram estimates particularly robust. We observed moderate perplexity and CER improvements when this model is interpolated with a random forest language model. In order to attain the goal we tried different strategies to build random forests on the available data and introduced a Forest of Random Forests language modeling scheme. However, the improvements we get for large data over a well-tuned baseline N-gram model are less impressive than those reported for smaller data tasks.

Keywords :

decision trees; natural language processing; speech recognition; CER improvements; LIMSI Mandarin Chinese STT system; interpolation; perplexity; random forests language models; text corpus; well-tuned baseline N-gram model; Data models; Decision trees; Entropy; Interpolation; Radio frequency; Training; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on

Conference_Location :

Tainan

Print_ISBN :

978-1-4244-6244-5

Type :

conf

DOI :

10.1109/ISCSLP.2010.5684903

Filename :

5684903

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2018220