A hybrid text-to-speech based on sub-band approach

Author

Inoue, Takuma ; Hara, Sunao ; Abe, Masanobu

Author_Institution

Dept. of Comput. Sci., Okayama Univ., Okayama, Japan

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

1

Lastpage

4

Abstract

This paper proposes a sub-band speech synthesis approach to develop high-quality Text-to-Speech (TTS). For the low-frequency band and high-frequency band, Hidden Markov Model (HMM)-based speech synthesis and waveform-based speech synthesis are used, respectively. Both speech synthesis methods are widely known to show good performance and to have benefits and shortcomings from different points of view. One motivation is to apply the right speech synthesis method in the right frequency band. Experiment results show that in terms of the smoothness the proposed approach shows better performance than waveform-based speech synthesis, and in terms of the clarity it shows better than HMM-based speech synthesis. Consequently, the proposed approach combines the inherent benefits from both waveform-based speech synthesis and HMM-based speech synthesis.

Keywords

hidden Markov models; speech synthesis; HMM; Hidden Markov Model; TTS; hybrid text-to-speech; speech synthesis methods; subband speech synthesis approach; waveform based speech synthesis; Frequency synthesizers; Harmonic analysis; Hidden Markov models; Speech; Speech synthesis; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location

Siem Reap

Type

conf

DOI

10.1109/APSIPA.2014.7041575

Filename

7041575