DocumentCode
177487
Title
Excitation modeling for HMM-based speech synthesis: Breaking down the impact of periodic and aperiodic components
Author
Drugman, Thomas ; Raitio, Tuomo
Author_Institution
TCTS Lab., Univ. of Mons, Mons, Belgium
fYear
2014
fDate
4-9 May 2014
Firstpage
260
Lastpage
264
Abstract
HMM-based speech synthesis generally suffers from typical buzzi-ness due to over-simplified excitation modeling of voiced speech. In order to alleviate this effect, several studies have proposed various new excitation models. No consensus has however been reached on what is the perceptual importance of the accurate modeling of the periodic and aperiodic components of voiced speech, and to what extent they separately contribute in improving naturalness. This paper considers a generalized mixed excitation modeling, common to various existing approaches, in which both periodic and aperiodic components coexist. At least three main factors may alter the quality of synthesis: periodic waveform, noise spectral weighting, and noise time envelope. Based on a large subjective evaluation, the goal of this paper is threefold: i) to evaluate the relative perceptual importance of each factor, ii) to investigate what is the most appropriate method to model the periodic and aperiodic components, and iii) to provide prospective clues for future work in excitation modeling.
Keywords
hidden Markov models; speech synthesis; HMM; aperiodic components; hidden Markov models; noise spectral weighting; noise time envelope; over-simplified excitation modeling; periodic waveform; speech synthesis; voiced speech; Feature extraction; Frequency modulation; Hidden Markov models; Noise; Speech; Speech synthesis; Vocoders; HMM-based speech synthesis; excitation modeling; glottal flow; residual signal;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6853598
Filename
6853598
Link To Document