DocumentCode :
177483
Title :
Natural speech synthesis based on hybrid approach with candidate expansion and verification
Author :
Chung-Hsien Wu ; Yi-Chin Huang ; Shih-Lun Lin ; Chia-Ping Chen
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
250
Lastpage :
254
Abstract :
A hybrid Mandarin speech synthesis system combining concatenation-based and model-based methodology is investigated in this research. To effectively exploit a small-size corpus, the candidate sets for unit selection are expanded via clusters based on articulatory features (AF), which are estimated as the outputs of an artificial neural network. This is followed by a filtering operation incorporating residual compensation, to remove unsuitable units. Given an input text, an optimal unit sequence is decided by the minimization of a total cost, which depends on the spectral features, contextual articulatory features, formants, and pitch values. Furthermore, prosodic word verification is integrated to check the smoothness of the output speech. The units failing to pass the prosodic word verification are replaced by model-based synthesized units for better speech quality. Objective and subjective evaluations have been conducted. Comparisons among the proposed method, the HMM-based method, and the conventional hybrid method clearly show that candidate set expansion based on articulatory features lead to more units suitable for selection, and the verification process is effective in improving the naturalness of the output speech.
Keywords :
filtering theory; hidden Markov models; natural language processing; neural nets; speech synthesis; HMM-based method; articulatory features; candidate set expansion; candidate verification; concatenation-based methodology; contextual articulatory features; filtering operation; formants; hidden Markov model; hybrid Mandarin speech synthesis system; hybrid approach; model-based methodology; natural speech synthesis; neural network; optimal unit sequence; pitch values; prosodic word verification; residual compensation; small-size corpus; spectral features; speech quality; total cost minimization; unit selection; unsuitable unit removal; Decision trees; Hidden Markov models; High-temperature superconductors; Pragmatics; Speech; Speech synthesis; Candidate expansion; Hybrid speech synthesis; Residual compensation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6853596
Filename :
6853596
Link To Document :
بازگشت