DocumentCode :
573252
Title :
Using multilayer perceptron for voicing strength estimation in HMM-based speech synthesis
Author :
Ogbureke, Udochukwu ; Cabral, Joao ; Berndsen, Julie
Author_Institution :
Sch. of Comput. Sci. & Inf., Univ. Coll. Dublin, Dublin, Ireland
fYear :
2012
fDate :
2-5 July 2012
Firstpage :
683
Lastpage :
688
Abstract :
The fundamental frequency (F0) Modelling is important for speech processing applications, for example, text-to-speech (TTS) synthesis. The most common method for modelling F0 in HMM-based speech synthesis is to use a mixture of discrete and continuous distributions for unvoiced and voiced speech respectively. The reason for using this type of model is that most F0 detection algorithms require a voiced/unvoiced (V/U) decision and F0 is set equal to a constant value in the unvoiced regions of speech (F0 is not defined in these regions). However, errors in voicing detection produce degradation in speech quality. The effect of voicing decision errors can be reduced by modelling F0 using continuous HMMs. This approach to modelling F0 requires a voicing strength parameter to be estimated which is used to decide if a speech frame is either voiced or unvoiced in the generation of the speech waveform from speech parameters. This paper proposes a method for voicing strength estimation based on multilayer perceptron (MLP) and compared it with a baseline method based on signal processing. Results showed that the MLP method obtained lower V/U mean error rate than the baseline.
Keywords :
hidden Markov models; multilayer perceptrons; parameter estimation; signal detection; speech synthesis; F0 detection algorithms; F0 modelling; HMM-based speech synthesis; MLP method; TTS synthesis; V-U mean error rate; baseline method; fundamental frequency modelling; hidden Markov model; multilayer perceptron; signal processing; speech quality; speech waveform; text-to-speech synthesis; voiced-unvoiced decision; voicing decision error effect; voicing detection; voicing strength parameter estimation; Estimation; Hidden Markov models; Robustness; Speech; Speech synthesis; Synthesizers; Training; HMM-based TTS; Voicing strength estimation; multilayer perceptron;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science, Signal Processing and their Applications (ISSPA), 2012 11th International Conference on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4673-0381-1
Electronic_ISBN :
978-1-4673-0380-4
Type :
conf
DOI :
10.1109/ISSPA.2012.6310640
Filename :
6310640
Link To Document :
بازگشت