DocumentCode :
3102133
Title :
Broadcast news transcription in Central-East European languages
Author :
Tarjan, Balazs ; Mozsolics, T. ; Balog, Andras ; Halmos, D. ; Fegyo, Tibor ; Mihajlik, Peter
Author_Institution :
THINKTech Research Center, Hungary
fYear :
2012
fDate :
2-5 Dec. 2012
Firstpage :
59
Lastpage :
64
Abstract :
This paper addresses two main issues. First, how to develop broadcast news transcription systems for Central-East European languages in a short time if only restricted language-specific knowledge is available; and second how to improve an already existing system by using on-line learning method. Accordingly, we present recognition results of two newly developed news transcription systems for Polish and Romanian languages, which are trained in fully data-driven manner based on only a few hours of manual transcriptions and web materials. Besides, an automatic language model updating method is also presented for our Hungarian transcription system. Continuous updating of the language model resulted in 2% relative WER (Word Error Rate) reduction measured on a 3 month long period primarily due to better language model parameter matching for IV (Intra Vocabulary) words and secondary due the reduction of OOV (Out Of Vocabulary) words. To the best of our knowledge, the first Romanian broadcast news recognition results are published in this study.
Keywords :
Hungarian; LVCSR; Polish; Romanian; broadcast news; cognitive infocommunication; morphologically rich languages; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cognitive Infocommunications (CogInfoCom), 2012 IEEE 3rd International Conference on
Conference_Location :
Kosice, Slovakia
Print_ISBN :
978-1-4673-5187-4
Electronic_ISBN :
978-1-4673-5186-7
Type :
conf
DOI :
10.1109/CogInfoCom.2012.6421940
Filename :
6421940
Link To Document :
بازگشت