Title :
Turkish broadcast news transcription with open-source software
Author :
Dogan Can;Murat Saraclar
Author_Institution :
Elektrik Elektronik M?hendisli?i B?l?m?, Bo?azi?i ?niversitesi, 34342, Bebek, ?stanbul, T?rkiye
fDate :
4/1/2009 12:00:00 AM
Abstract :
In this paper, we present our Turkish large vocabulary continuous speech recognition (LVCSR) system, which is based on open-source software (HTK, SRILM) and which utilizes 187 hours of Turkish broadcast news data as well as a 184 million-word text corpus collected from various Turkish news portals. Within this system, three different acoustic models optimizing ML, MMI and MPE criteria were developed and the contribution of discriminative acoustic modeling to Turkish LVCSR was investigated. Recognition experiments utilizing a tri-gram language model with 50 K vocabulary give word error rates of 25.8% with ML, 24.3% with MMI and finally 23.7% with MPE.
Keywords :
"Open source software","Broadcasting","Vocabulary","Maximum likelihood estimation","Speech recognition","Portals","Error analysis","Mutual information","Lattices"
Conference_Titel :
Signal Processing and Communications Applications Conference, 2009. SIU 2009. IEEE 17th
Print_ISBN :
978-1-4244-4435-9
DOI :
10.1109/SIU.2009.5136398