مرکز منطقه ای اطلاع رساني علوم و فناوري - Toward benchmarking a general-domain Thai LVCSR System

DocumentCode :

519270

Title :

Toward benchmarking a general-domain Thai LVCSR System

Author :

Chotimongkol, A. ; Saykhum, K. ; Thatphithakkul, N. ; Wutiwiwatchai, C.

Author_Institution :

Nat. Electron. & Comput. Technol. Center (NECTEC), Pathumthani, Thailand

fYear :

2010

fDate :

19-21 May 2010

Firstpage :

1080

Lastpage :

1084

Abstract :

We believe that a benchmark evaluation is one of the key factors that help accelerate research and development of a Thai speech recognition system as various algorithms and training techniques can be systematically compared. In this paper, we are interested in benchmarking a general-domain Thai Large Vocabulary Continuous Speech Recognition (LVCSR) system using the LOTUS speech corpus. We conducted a set of experiments as an initial attempt to benchmark the performance of a general domain Thai LVCSR system. In our experiments, we explored some variations of three acoustic model training parameters: the number of tied-state triphones, the number of Gaussian mixtures and a list of triphones. For language model training, we evaluated the usefulness of additional data from a large text corpus. We found that an acoustic model trained with higher number of tied-state triphones and higher number of Gaussian mixtures achieved better recognition accuracy. For language model training, we found that using additional data from a large text corpus help improve the recognition performance of the LVCSR system. The best recognition performance in terms of word error rate on the LOTUS evaluation test set (ET) is 24.4%. This result was obtained when a list of triphones manually selected by a linguist was used for training an acoustic model with 3,000 tied-state triphones and 32 Gaussian mixtures while the language model is a linear interpolation of two language models, one trained from the LOTUS training set (TR) and another one trained from the large text corpus BEST.

Keywords :

Gaussian processes; interpolation; natural language processing; speech recognition; Gaussian mixture; LOTUS speech corpus; LVCSR system; Thai large vocabulary continuous speech recognition; acoustic model training parameter; benchmark evaluation; language model training; linear interpolation; tied-state triphone; Acceleration; Acoustic testing; Automatic speech recognition; Benchmark testing; Broadcasting; Natural languages; Research and development; Speech analysis; Speech recognition; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on

Conference_Location :

Chaing Mai

Print_ISBN :

978-1-4244-5606-2

Electronic_ISBN :

978-1-4244-5607-9

Type :

conf

Filename :

5491642

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=519270