مرکز منطقه ای اطلاع رساني علوم و فناوري - Vocaine the vocoder and applications in speech synthesis

DocumentCode :

730660

Title :

Vocaine the vocoder and applications in speech synthesis

Author :

Agiomyrgiannakis, Yannis

Author_Institution :

Google, London, UK

fYear :

2015

fDate :

19-24 April 2015

Firstpage :

4230

Lastpage :

4234

Abstract :

Vocoders received renewed attention recently as basic components in speech synthesis applications such as voice transformation, voice conversion and statistical parametric speech synthesis. This paper presents a new vocoder synthesizer, referred to as Vocaine, that features a novel Amplitude Modulated-Frequency Modulated (AM-FM) speech model, a new way to synthesize non-stationary sinusoids using quadratic phase splines and a super fast cosine generator. Extensive evaluations are made against several state-of-the-art methods in Copy-Synthesis and Text-To-Speech synthesis experiments. Vocaine matches or outperforms STRAIGHT in Copy-Synthesis experiments and outperforms our baseline real-time optimized Mixed-Excitation vocoder with the same computational cost. We report that Vocaine considerably improves our statistical TTS synthesizers and that our new statistical parametric synthesizer [1] matched the quality of our mature production Unit-Selection system with uncompressed waveforms.

Keywords :

amplitude modulation; frequency modulation; speech synthesis; vocoders; AM-FM speech model; Vocaine; amplitude modulated-frequency modulated speech model; baseline real-time optimized mixed-excitation vocoder; copy-synthesis; non-stationary sinusoids; quadratic phase splines; statistical TTS synthesizers; statistical parametric speech synthesis; statistical parametric synthesizer; super fast cosine generator; text-to-speech synthesis experiments; uncompressed waveforms; unit-selection system; voice conversion; voice transformation; Hidden Markov models; Speech; Speech synthesis; Synthesizers; Vocoders; AM-FM; fast cosine generators; non-stationary; overlap-add; phase models; sinusoidal speech models; statistical parametric speech synthesis; text-to-speech; vocoders;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location :

South Brisbane, QLD

Type :

conf

DOI :

10.1109/ICASSP.2015.7178768

Filename :

7178768

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=730660