• DocumentCode
    149608
  • Title

    Voice source modelling using deep neural networks for statistical parametric speech synthesis

  • Author

    Raitio, Tuomo ; Heng Lu ; Kane, John ; Suni, Antti ; Vainio, Markku ; King, Simon ; Alku, Paavo

  • Author_Institution
    Dept. of Signal Process. & Acoust., Aalto Univ., Espoo, Finland
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    2290
  • Lastpage
    2294
  • Abstract
    This paper presents a voice source modelling method employing a deep neural network (DNN) to map from acoustic features to the time-domain glottal flow waveform. First, acoustic features and the glottal flow signal are estimated from each frame of the speech database. Pitch-synchronous glottal flow time-domain waveforms are extracted, interpolated to a constant duration, and stored in a codebook. Then, a DNN is trained to map from acoustic features to these duration-normalised glottal waveforms. At synthesis time, acoustic features are generated froma statistical parametricmodel, and from these, the trained DNN predicts the glottal flow waveform. Illustrations are provided to demonstrate that the proposed method successfully synthesises the glottal flow waveform and enables easy modification of the waveform by adjusting the input values to the DNN. In a subjective listening test, the proposed method was rated as equal to a high-quality method employing a stored glottal flow waveform.
  • Keywords
    acoustic signal processing; neural nets; speech synthesis; statistical analysis; time-domain analysis; waveform analysis; DNN; acoustic features; deep neural networks; glottal flow signal; speech database; statistical parametric speech synthesis; time-domain glottal flow waveform; voice source modelling; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Speech; Speech synthesis; Training; DNN; Deep neural network; glottal flow; statistical parametric speech synthesis; voice source modelling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European
  • Conference_Location
    Lisbon
  • Type

    conf

  • Filename
    6952838