Title :
Improving language models for ASR using translated in-domain data
Author :
Stefan Kombrink;Tomáš Mikolov;Martin Karafiát;Lukáš Burget
Author_Institution :
Brno University of Technology, Czech
fDate :
3/1/2012 12:00:00 AM
Abstract :
Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language model training data in a scenario where the baseline language model is trained on off- and close-domain data only. We report perplexities, OOV and word error rates and examine different data sets and translators on their suitability for the described task.
Keywords :
"Data models","Speech","Dictionaries","Google","Speech recognition","Acoustics","Decoding"
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Print_ISBN :
978-1-4673-0045-2
DOI :
10.1109/ICASSP.2012.6288896