DocumentCode :
3648282
Title :
Improving language models for ASR using translated in-domain data
Author :
Stefan Kombrink;Tomáš Mikolov;Martin Karafiát;Lukáš Burget
Author_Institution :
Brno University of Technology, Czech
fYear :
2012
fDate :
3/1/2012 12:00:00 AM
Firstpage :
4405
Lastpage :
4408
Abstract :
Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language model training data in a scenario where the baseline language model is trained on off- and close-domain data only. We report perplexities, OOV and word error rates and examine different data sets and translators on their suitability for the described task.
Keywords :
"Data models","Speech","Dictionaries","Google","Speech recognition","Acoustics","Decoding"
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Type :
conf
DOI :
10.1109/ICASSP.2012.6288896
Filename :
6288896
Link To Document :
بازگشت