Title :
Realizing Tibetan Lhasa speech concatenation synthesis system based on a large corpus
Author :
Zhenye Gan ; Zhenwen Wang ; Hongwu Yang
Author_Institution :
Coll. of Phys. & Electron. Eng., Northwest Normal Univ., Lanzhou, China
Abstract :
This paper presents a method to realize the Tibetan Lhasa speech concatenation synthesis based on a large corpus. A large corpus of Tibetan Lhasa dialect is established by analyzing the characteristics of Tibetan Lhasa dialect. A grapheme-to-phoneme conversion method is realized to convert Tibetan sentences to Speech Assessment Methods Phonetic Alphabet (SAMPA)-based Pinyin sequences. Firstly, Tibetan text is converted to Pinyin sequences based on SAMPA-T transformation method. Then the Tibetan acoustic finals and syllables are used as units to builds Classification and Regression Tree (CART) according to the spectral distance of each candidate units and the context dependent question sets. The CART algorithm is applied to choose the acoustic finals and syllables which are most conform to the context information. Finally, the Tibetan Lhasa speech is then synthesized by waveform concatenation synthesis method. Tests show that the MOS of Synthetic Tibetan Lhasa speech by using acoustic finals or syllables as units is 3.9 points and 4.1 points respectively. The quality of synthesized Tibetan Lhasa speech by using syllables as units is better than acoustic finals.
Keywords :
regression analysis; speech synthesis; text analysis; CART algorithm; SAMPA-T transformation method; SAMPA-based Pinyin sequences; Tibetan Lhasa dialect; Tibetan Lhasa speech concatenation synthesis system; Tibetan acoustic finals; Tibetan sentences; Tibetan text; classification; context dependent question sets; grapheme-to-phoneme conversion method; large corpus; regression tree; speech assessment methods phonetic alphabet; syllables; synthesized Tibetan Lhasa speech; synthetic Tibetan Lhasa speech; waveform concatenation synthesis method; Acoustics; Dictionaries; Educational institutions; Hidden Markov models; Speech; Speech synthesis; Classification and Regression Tree; Corpus; Tibetan Lhasa dialects; Waveform concatenation synthesis;
Conference_Titel :
Orange Technologies (ICOT), 2014 IEEE International Conference on
Conference_Location :
Xian
DOI :
10.1109/ICOT.2014.6956607