Title :
100K+ words, machine-readable, pronunciation dictionary for the Romanian language
Author :
Domokos, József ; Buza, Ovidiu ; Toderean, Gavril
Abstract :
This paper intends to present a newly developed Romanian language pronunciation dictionary called NaviRo. The dictionary contains more than 100k words from the DexOnline dictionary together with their phonetic transcriptions in Speech Assessment Method Phonetic Alphabet (SAMPA), a machine readable alphabet. The development of the pronunciation dictionary and the system architecture are also described in the paper. NaviRo pronunciation dictionary is freely available on the project website in HTK (Hidden Markov Model Toolkit) and Festival Speech Synthesis System dictionary format. There are also available for download the used grapheme and phoneme set and the audio samples for the used phonemes. The use of these resources is completely unrestricted for any research purposes in order to promote Romanian language speech technology research.
Keywords :
Web sites; dictionaries; hidden Markov models; natural language processing; speech processing; word processing; DexOnline dictionary; HTK; NaviRo pronunciation dictionary; Romanian language pronunciation dictionary; Romanian language speech technology research; SAMPA; audio samples; festival speech synthesis system dictionary format; grapheme set; hidden Markov model toolkit; machine-readable dictionary; phoneme set; phonetic transcriptions; project Website; speech assessment method phonetic alphabet; system architecture; Artificial neural networks; Context; Databases; Dictionaries; Encoding; Speech; Speech recognition; Romanian language speech recognition; grapheme-to-phoneme conversion; letter-to-sound conversion; phonetic transcription; speech synthesis pronunciation dictionary;
Conference_Titel :
Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European
Conference_Location :
Bucharest
Print_ISBN :
978-1-4673-1068-0