DocumentCode :
3632023
Title :
Turkish broadcast news transcription with open-source software
Author :
Dogan Can;Murat Saraclar
Author_Institution :
Elektrik Elektronik M?hendisli?i B?l?m?, Bo?azi?i ?niversitesi, 34342, Bebek, ?stanbul, T?rkiye
fYear :
2009
fDate :
4/1/2009 12:00:00 AM
Firstpage :
325
Lastpage :
328
Abstract :
In this paper, we present our Turkish large vocabulary continuous speech recognition (LVCSR) system, which is based on open-source software (HTK, SRILM) and which utilizes 187 hours of Turkish broadcast news data as well as a 184 million-word text corpus collected from various Turkish news portals. Within this system, three different acoustic models optimizing ML, MMI and MPE criteria were developed and the contribution of discriminative acoustic modeling to Turkish LVCSR was investigated. Recognition experiments utilizing a tri-gram language model with 50 K vocabulary give word error rates of 25.8% with ML, 24.3% with MMI and finally 23.7% with MPE.
Keywords :
"Open source software","Broadcasting","Vocabulary","Maximum likelihood estimation","Speech recognition","Portals","Error analysis","Mutual information","Lattices"
Publisher :
ieee
Conference_Titel :
Signal Processing and Communications Applications Conference, 2009. SIU 2009. IEEE 17th
ISSN :
2165-0608
Print_ISBN :
978-1-4244-4435-9
Type :
conf
DOI :
10.1109/SIU.2009.5136398
Filename :
5136398
Link To Document :
بازگشت