Title :
Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis
Author :
Li Gao ; Zhen-Hua Ling ; Ling-Hui Chen ; Li-Rong Dai
Author_Institution :
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
The speech generated by hidden Markov model (HMM) based speech synthesis method always sounds monotonous compared with natural recordings. An important reason is that the predicted F0 trajectories are over-smoothed. This arises from the adoption of frame-level F0 features and the averaging effect of acoustic modeling using Gaussians in the conventional F0 modeling approach. In this paper, we propose a method to improve the F0 prediction of HMM-based Mandarin speech synthesis in a post-filtering way. Syllable-level F0 features, e.g., length-normalized logF0 vectors or quantitative target approximation (qTA) parameters, are extracted from the F0 trajectories predicted by the conventional approach. These features are mapped towards natural ones by Gaussian bidirectional associative memory (GBAM) based transformation. Our subjective experiments indicate that the GBAM-based F0 post-filtering method using either logF0 vectors or qTA parameters can significantly improve the naturalness of synthetic speech. Using raw logF0 vectors for post-filtering can achieve better performance than using derived qTA parameters.
Keywords :
Gaussian processes; approximation theory; filtering theory; hidden Markov models; natural language processing; speech synthesis; vectors; F0 prediction; F0 trajectories; GBAM-based F0 post-filtering method; GBAM-based transformation; Gaussian bidirectional associative memory; HMM-based Mandarin speech synthesis; IοgF0 vectors; hidden Markov model based speech synthesis method; length-normalized IοgF0 vectors; qTA parameters; quantitative target approximation parameters; raw IοgF0 vectors; syllable-level F0 features; synthetic speech; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Training; Trajectory; Vectors; bidirectional associative memory; hidden Markov model; speech synthesis; target approximation;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936598