مرکز منطقه ای اطلاع رساني علوم و فناوري - A waveform representation framework for high-quality statistical parametric speech synthesis

DocumentCode :

3752079

Title :

A waveform representation framework for high-quality statistical parametric speech synthesis

Author :

Bo Fan;Siu Wa Lee;Xiaohai Tian;Lei Xie;Minghui Dong

Author_Institution :

School of Computer Science, Northwestern Polytechnical University, Xi´an, China

fYear :

2015

Firstpage :

530

Lastpage :

536

Abstract :

State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter during synthesis and the speech quality suffers. To bypass this bottleneck in vocoded speech, this paper proposes a phase-embedded waveform representation framework and establishes a magnitude-phase joint modeling platform for high-quality SPSS. Our experiments on waveform reconstruction show that the performance is better than that of the widely-used STRAIGHT. Furthermore, the proposed modeling and synthesis platform outperforms a leading-edge, vocoded, deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN)-based baseline system in various objective evaluation metrics conducted.

Keywords :

"Speech","Hidden Markov models","Trajectory","Vocoders","Time-domain analysis","Speech synthesis","Robustness"

Publisher :

ieee

Conference_Titel :

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific

Type :

conf

DOI :

10.1109/APSIPA.2015.7415327

Filename :

7415327

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3752079