• DocumentCode
    1688853
  • Title

    Methods for rapid development of automatic speech recognition system for Russian

  • Author

    Safarik, Radek ; Nouza, Jan

  • Author_Institution
    Inst. of Inf. Technol. & Electron., Tech. Univ. of Liberec Liberec, Liberec, Czech Republic
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers´ web pages and convert it from Cyrillic to Latin to simplify further processing. The corpus is used to create a representative lexicon with 218K words and 259K pronunciations and a probabilistic language model. When training the acoustic model (AM), we use the GlobalPhone database of recordings and a largely automated scheme that includes bootstrapping with an existing Czech AM and several iterative steps to gradually improve both phonetic annotations and the target Russian AM. The recent prototype of the Russian ASR system is evaluated on the test part of the GlobalPhone database and achieves 18.2 % word error rate.
  • Keywords
    computer bootstrapping; natural language processing; speech recognition; Cyrillic; Czech; Czech AM; GlobalPhone database; Latin; Russian AM; Russian ASR system; Russian language; Slavic languages; Slovak; acoustic model; automatic speech recognition system; bootstrapping; lexicon; phonetic annotations; probabilistic language model; pronunciations; Acoustics; Databases; Prototypes; Speech; Stress; Training; Vocabulary; Russian; acoustic model; language model; multi-lingual; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM), 2015 IEEE International Workshop of
  • Conference_Location
    Liberec
  • Print_ISBN
    978-1-4799-6970-8
  • Type

    conf

  • DOI
    10.1109/ECMSM.2015.7208686
  • Filename
    7208686