• DocumentCode
    3251374
  • Title

    Iterative grapheme-to-phoneme alignment for the training of WFST-based phonetic conversion

  • Author

    Bohac, Marek ; Malek, Jiri ; Blavka, Karel

  • Author_Institution
    Inst. of Inf. Technol. & Electron., Tech. Univ. of Liberec, Liberec, Czech Republic
  • fYear
    2013
  • fDate
    2-4 July 2013
  • Firstpage
    474
  • Lastpage
    478
  • Abstract
    In this paper we propose an algorithm for grapheme-to-phoneme (G2P) alignment. Such alignment is needed mainly for the data-driven training of G2P conversion tools. Our approach utilizes a given phonetic alphabet and a set of given orthographic-phonetic word pairs as a source of prior knowledge. The development data are taken from a manually created pronunciation lexicon for a large vocabulary speech recognition system for Czech. The alignment method is based on extended Minimum Edit Distance algorithm. Moreover, we propose an approach to avoid the creation of reference alignments - we evaluate the improvements through a specially designed G2P converter, i.e. we compare the phonetic transcription directly to a set of test orthographic-phonetic word pairs. Results of our approach are comparable or even slightly better than the state-of-the-art.
  • Keywords
    iterative methods; speech processing; speech recognition; Czech; data-driven training; iterative grapheme-to-phoneme alignment; minimum edit distance algorithm; orthographic-phonetic word pairs; phonetic conversion; phonetic transcription; vocabulary speech recognition system; weighted finite state transducers; Dictionaries; Educational institutions; Measurement; Speech recognition; Training; Training data; Vocabulary; Alignment; Grapheme-to-phoneme; Phonetisaurus; WFST; conversion;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications and Signal Processing (TSP), 2013 36th International Conference on
  • Conference_Location
    Rome
  • Print_ISBN
    978-1-4799-0402-0
  • Type

    conf

  • DOI
    10.1109/TSP.2013.6613977
  • Filename
    6613977