Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion

Author

Kobayashi, Kazuhiro ; Toda, Tomoki ; Nakano, Tomoyasu ; Goto, Masataka ; Neubig, Graham ; Sakti, Sakriani ; Nakamura, Satoshi

Author_Institution

Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol. (NAIST), Ikoma, Japan

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

1

Lastpage

4

Abstract

The perceived age of a singing voice, which is the age of the singer as perceived by the listener, is one of the intuitively understandable measures to describe voice characteristics of the singing voice. Singers can sing expressively by controlling voice timbre to some extent but the varieties of voice timbre that singers can produce are limited by physical constraints. To overcome this limitation, previous work has proposed statistical voice timbre control technique based on the perceived age. This technique makes it possible to control the perceived age of singing voice while retaining singer individuality by the use of statistical voice conversion (SVC) with a multiple-regression Gaussian mixture model (MR-GMM). However, the range of controllable perceived age is limited and speech quality of the converted singing voice is significantly degraded compared to that of a natural singing voice. In this paper, we propose a method for perceived age control using direct waveform modification based on spectrum differential and gender-dependent modeling. The experimental results show that the proposed method makes the range of controllable perceived age wider and quality of converted singing voice higher compared to the conventional method.

Keywords

Gaussian processes; mixture models; regression analysis; speech processing; MR-GMM; SVC; direct waveform modification; gender-dependent modeling; gender-dependent spectrum differential model; multiple-regression Gaussian mixture model; natural singing voice; perceived age control; physical constraint; singing voice conversion; spectrum differential; speech quality; statistical voice conversion; statistical voice timbre control technique; Joints; Speech; Static VAr compensators; Timbre; Training; Vectors; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)

Conference_Location

Siem Reap

Type

conf

DOI

10.1109/APSIPA.2014.7041590

Filename

7041590