• DocumentCode
    118037
  • Title

    Gender-dependent spectrum differential models for perceived age control based on direct waveform modification in singing voice conversion

  • Author

    Kobayashi, Kazuhiro ; Toda, Tomoki ; Nakano, Tomoyasu ; Goto, Masataka ; Neubig, Graham ; Sakti, Sakriani ; Nakamura, Satoshi

  • Author_Institution
    Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol. (NAIST), Ikoma, Japan
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    The perceived age of a singing voice, which is the age of the singer as perceived by the listener, is one of the intuitively understandable measures to describe voice characteristics of the singing voice. Singers can sing expressively by controlling voice timbre to some extent but the varieties of voice timbre that singers can produce are limited by physical constraints. To overcome this limitation, previous work has proposed statistical voice timbre control technique based on the perceived age. This technique makes it possible to control the perceived age of singing voice while retaining singer individuality by the use of statistical voice conversion (SVC) with a multiple-regression Gaussian mixture model (MR-GMM). However, the range of controllable perceived age is limited and speech quality of the converted singing voice is significantly degraded compared to that of a natural singing voice. In this paper, we propose a method for perceived age control using direct waveform modification based on spectrum differential and gender-dependent modeling. The experimental results show that the proposed method makes the range of controllable perceived age wider and quality of converted singing voice higher compared to the conventional method.
  • Keywords
    Gaussian processes; mixture models; regression analysis; speech processing; MR-GMM; SVC; direct waveform modification; gender-dependent modeling; gender-dependent spectrum differential model; multiple-regression Gaussian mixture model; natural singing voice; perceived age control; physical constraint; singing voice conversion; spectrum differential; speech quality; statistical voice conversion; statistical voice timbre control technique; Joints; Speech; Static VAr compensators; Timbre; Training; Vectors; Vocoders;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041590
  • Filename
    7041590