Multi-distribution deep belief network for speech synthesis

Author

Shiyin Kang ; Xiaojun Qian ; Meng, Hsiang-Yun

Author_Institution

Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Hong Kong, China

fYear

2013

Firstpage

8012

Lastpage

8016

Abstract

Deep belief network (DBN) has been shown to be a good generative model in tasks such as hand-written digit image generation. Previous work on DBN in the speech community mainly focuses on using the generatively pre-trained DBN to initialize a discriminative model for better acoustic modeling in speech recognition (SR). To fully utilize its generative nature, we propose to model the speech parameters including spectrum and F0 simultaneously and generate these parameters from DBN for speech synthesis. Compared with the predominant HMM-based approach, objective evaluation shows that the spectrum generated from DBN has less distortion. Subjective results also confirm the advantage of the spectrum from DBN, and the overall quality is comparable to that of context-independent HMM.

Keywords

belief networks; handwriting recognition; hidden Markov models; speech recognition; speech synthesis; DBN; HMM-based approach; SR; acoustic modeling; handwritten digit image generation; multidistribution deep belief network; speech community; speech parameters; speech recognition; speech synthesis; Acoustics; Hidden Markov models; Speech; Speech recognition; Speech synthesis; Training; Deep belief network; Speech synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639225

Filename

6639225