DocumentCode :
519270
Title :
Toward benchmarking a general-domain Thai LVCSR System
Author :
Chotimongkol, A. ; Saykhum, K. ; Thatphithakkul, N. ; Wutiwiwatchai, C.
Author_Institution :
Nat. Electron. & Comput. Technol. Center (NECTEC), Pathumthani, Thailand
fYear :
2010
fDate :
19-21 May 2010
Firstpage :
1080
Lastpage :
1084
Abstract :
We believe that a benchmark evaluation is one of the key factors that help accelerate research and development of a Thai speech recognition system as various algorithms and training techniques can be systematically compared. In this paper, we are interested in benchmarking a general-domain Thai Large Vocabulary Continuous Speech Recognition (LVCSR) system using the LOTUS speech corpus. We conducted a set of experiments as an initial attempt to benchmark the performance of a general domain Thai LVCSR system. In our experiments, we explored some variations of three acoustic model training parameters: the number of tied-state triphones, the number of Gaussian mixtures and a list of triphones. For language model training, we evaluated the usefulness of additional data from a large text corpus. We found that an acoustic model trained with higher number of tied-state triphones and higher number of Gaussian mixtures achieved better recognition accuracy. For language model training, we found that using additional data from a large text corpus help improve the recognition performance of the LVCSR system. The best recognition performance in terms of word error rate on the LOTUS evaluation test set (ET) is 24.4%. This result was obtained when a list of triphones manually selected by a linguist was used for training an acoustic model with 3,000 tied-state triphones and 32 Gaussian mixtures while the language model is a linear interpolation of two language models, one trained from the LOTUS training set (TR) and another one trained from the large text corpus BEST.
Keywords :
Gaussian processes; interpolation; natural language processing; speech recognition; Gaussian mixture; LOTUS speech corpus; LVCSR system; Thai large vocabulary continuous speech recognition; acoustic model training parameter; benchmark evaluation; language model training; linear interpolation; tied-state triphone; Acceleration; Acoustic testing; Automatic speech recognition; Benchmark testing; Broadcasting; Natural languages; Research and development; Speech analysis; Speech recognition; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on
Conference_Location :
Chaing Mai
Print_ISBN :
978-1-4244-5606-2
Electronic_ISBN :
978-1-4244-5607-9
Type :
conf
Filename :
5491642
Link To Document :
بازگشت