DocumentCode
1693866
Title
Multi-distribution deep belief network for speech synthesis
Author
Shiyin Kang ; Xiaojun Qian ; Meng, Hsiang-Yun
Author_Institution
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China
fYear
2013
Firstpage
8012
Lastpage
8016
Abstract
Deep belief network (DBN) has been shown to be a good generative model in tasks such as hand-written digit image generation. Previous work on DBN in the speech community mainly focuses on using the generatively pre-trained DBN to initialize a discriminative model for better acoustic modeling in speech recognition (SR). To fully utilize its generative nature, we propose to model the speech parameters including spectrum and F0 simultaneously and generate these parameters from DBN for speech synthesis. Compared with the predominant HMM-based approach, objective evaluation shows that the spectrum generated from DBN has less distortion. Subjective results also confirm the advantage of the spectrum from DBN, and the overall quality is comparable to that of context-independent HMM.
Keywords
belief networks; handwriting recognition; hidden Markov models; speech recognition; speech synthesis; DBN; HMM-based approach; SR; acoustic modeling; handwritten digit image generation; multidistribution deep belief network; speech community; speech parameters; speech recognition; speech synthesis; Acoustics; Hidden Markov models; Speech; Speech recognition; Speech synthesis; Training; Deep belief network; Speech synthesis;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639225
Filename
6639225
Link To Document