DocumentCode
134206
Title
Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis
Author
Li Gao ; Zhen-Hua Ling ; Ling-Hui Chen ; Li-Rong Dai
Author_Institution
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
275
Lastpage
279
Abstract
The speech generated by hidden Markov model (HMM) based speech synthesis method always sounds monotonous compared with natural recordings. An important reason is that the predicted F0 trajectories are over-smoothed. This arises from the adoption of frame-level F0 features and the averaging effect of acoustic modeling using Gaussians in the conventional F0 modeling approach. In this paper, we propose a method to improve the F0 prediction of HMM-based Mandarin speech synthesis in a post-filtering way. Syllable-level F0 features, e.g., length-normalized logF0 vectors or quantitative target approximation (qTA) parameters, are extracted from the F0 trajectories predicted by the conventional approach. These features are mapped towards natural ones by Gaussian bidirectional associative memory (GBAM) based transformation. Our subjective experiments indicate that the GBAM-based F0 post-filtering method using either logF0 vectors or qTA parameters can significantly improve the naturalness of synthetic speech. Using raw logF0 vectors for post-filtering can achieve better performance than using derived qTA parameters.
Keywords
Gaussian processes; approximation theory; filtering theory; hidden Markov models; natural language processing; speech synthesis; vectors; F0 prediction; F0 trajectories; GBAM-based F0 post-filtering method; GBAM-based transformation; Gaussian bidirectional associative memory; HMM-based Mandarin speech synthesis; IοgF0 vectors; hidden Markov model based speech synthesis method; length-normalized IοgF0 vectors; qTA parameters; quantitative target approximation parameters; raw IοgF0 vectors; syllable-level F0 features; synthetic speech; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Training; Trajectory; Vectors; bidirectional associative memory; hidden Markov model; speech synthesis; target approximation;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936598
Filename
6936598
Link To Document