Title :
Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information
Author :
Thanh-Son Phan ; Tu-Cuong Duong ; Anh-Tuan Dinh ; Tat-Thang Vu ; Chi-Mai Luong
Author_Institution :
Fac. of Inf. Technol., Le Qui Don Tech. Univ., Hanoi, Vietnam
Abstract :
Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test.
Keywords :
decision trees; hidden Markov models; natural language processing; speech synthesis; HMM-based Vietnamese TTS naturalness improvement; HMM-based Vietnamese speech synthesis; HMM-based text-to-speech systems; HMM-based voice; MOS test; POS; context dependent triphone units; context-dependent information; context-dependent tones; decision tree questions; full-context labels; hidden Markov models; intonation information; intonation pattern; intonation tagging; large corpus speech database; natural-sounding synthesized speech; objective evaluation; part-of-speech; pause; phoneme sequence; positional information; prosodic information; prosodic specification; prosody features; segmental duration; synthetic speech; Context; Databases; Decision trees; Hidden Markov models; Speech; Training; Vectors; HMM; HTS; Vietnamese Speech Synthesis; context-dependent; decision tree-based clustering; part-of-speech; prosodic information; tri-phone;
Conference_Titel :
Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4799-1349-7
DOI :
10.1109/RIVF.2013.6719907