DocumentCode
2660165
Title
Improving word segmentation for Thai speech translation
Author
Charoenpornsawat, Paisarn ; Schultz, Tanja
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
241
Lastpage
244
Abstract
A vocabulary list and language model are primary components in a speech translation system. Generating both from plain text is a straightforward task for English. However, it is quite challenging for Chinese, Japanese, or Thai which provide no word segmentation, i.e. the text has no word boundary delimiter. For Thai word segmentation, maximal matching, a lexicon-based approach, is one of the popular methods. Nevertheless this method heavily relies on the coverage of the lexicon. When text contains an unknown word, this method usually produces a wrong boundary. When extracting words from this segmented text, some words will not be retrieved because of wrong segmentation. In this paper, we propose statistical techniques to tackle this problem. Based on different word segmentation methods we develop various speech translation systems and show that the proposed method can significantly improve the translation accuracy by about 6.42% BLEU points compared to the baseline system.
Keywords
feature extraction; language translation; natural language processing; speech recognition; statistical analysis; vocabulary; Thai speech translation; language model; lexicon-based approach; maximal matching; speech recognition; statistical techniques; text segmentation; vocabulary list; word extraction; word segmentation; Automatic speech recognition; Dictionaries; Entropy; Natural language processing; Natural languages; Speech recognition; Surface-mount technology; Text processing; Training data; Vocabulary; Speech Recognition; Spoken language translation; Text Processing; Word Segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location
Goa
Print_ISBN
978-1-4244-3471-8
Electronic_ISBN
978-1-4244-3472-5
Type
conf
DOI
10.1109/SLT.2008.4777885
Filename
4777885
Link To Document