DocumentCode :
1386101
Title :
Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora
Author :
Godoy, Elizabeth ; Rosec, Olivier ; Chonavel, Thierry
Author_Institution :
TECH/ASAP/VOICE, Orange Labs., Lannion, France
Volume :
20
Issue :
4
fYear :
2012
fDate :
5/1/2012 12:00:00 AM
Firstpage :
1313
Lastpage :
1323
Abstract :
In Voice Conversion (VC), the speech of a source speaker is modified to resemble that of a particular target speaker. Currently, standard VC approaches use Gaussian mixture model (GMM)-based transformations that do not generate high-quality converted speech due to “over-smoothing” resulting from weak links between individual source and target frame parameters. Dynamic Frequency Warping (DFW) offers an appealing alternative to GMM-based methods, as more spectral details are maintained in transformation; however, the speaker timbre is less successfully converted because spectral power is not adjusted explicitly. Previous work combines separate GMM- and DFW-transformed spectral envelopes for each frame. This paper proposes a more effective DFW-based approach that (1) does not rely on the baseline GMM methods, and (2) functions on the acoustic class level. To adjust spectral power, an amplitude scaling function is used that compares the average target and warped source log spectra for each acoustic class. The proposed DFW with Amplitude scaling (DFWA) outperforms standard GMM and hybrid GMM-DFW methods for VC in terms of both speech quality and timbre conversion, as is confirmed in extensive objective and subjective testing. Furthermore, by not requiring time-alignment of source and target speech, DFWA is able to perform equally well using parallel or nonparallel corpora, as is demonstrated explicitly.
Keywords :
acoustic signal processing; speech processing; acoustic class level; amplitude scaling; dynamic frequency warping; nonparallel corpora; parallel corpora; speaker timbre; voice conversion; Correlation; Probabilistic logic; Smoothing methods; Speech; Timbre; Vectors; Frequency warping; Gaussian mixture model (GMM); Voice Conversion (VC); nonparallel corpora; spectral envelope;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2011.2177820
Filename :
6093737
Link To Document :
بازگشت