• DocumentCode
    3574376
  • Title

    LP and TD-PSOLA-based incorporation of happiness in neutral speech using time-domain parameters

  • Author

    Sreenidhi, S. ; Rachel, G. Anushiya ; Vijayalakshmi, P. ; Nagarajan, T.

  • Author_Institution
    SSN Coll. of Eng., Chennai, India
  • fYear
    2014
  • Firstpage
    1158
  • Lastpage
    1162
  • Abstract
    Emotions express a person´s internal state of being and it is reflected in the speech utterances. Emotions affect the time-domain characteristics of the speech signal, namely intonation patterns, speech rate, and short-term energy function. Conventional text-to-speech (TTS) systems are built to produce speech utterances for a given text, without any emotion, which can be called as neutral speech. Building a TTS system which can produce speech utterances with expected emotion is not a trivial task, in the sense that for each of the emotions, a separate speech corpus should be carefully collected and the system should be built. Therefore, the current work focuses on incorporating happiness into neutral speech using signal processing algorithms. In this regard, neutral and happy speech are analyzed and it is found that happiness can be perceived in certain emotive words in a sentence. Thus, in order to introduce happiness into neutral speech, these emotive keywords are identified and the above mentioned time-domain parameters are modified. Linear prediction-based synthesis of happy speech is initially performed. To improve the quality of the synthesized speech, TD-PSOLA is then used. Subjective evaluation yields a mean opinion score of 2.05 (out of a maximum of 3) for happy speech synthesized using linear prediction and 2.53 for those synthesized using TD-PSOLA.
  • Keywords
    speech processing; speech synthesis; time-domain analysis; LP; TD-PSOLA; TTS system; emotive keywords; happy speech; intonation patterns; linear prediction-based synthesis; neutral speech; person internal state; short-term energy function; signal processing algorithms; speech corpus; speech rate; speech signal; speech utterances; text-to-speech systems; time-domain parameters; time-domain pitch synchronous overlap-add-based synthesis techniques; Computers; Polynomials; Spectrogram; Speech; Speech synthesis; Time-domain analysis; TD-PSOLA; happiness incorporation; linear prediction; neutral speech; pitch contour; short-term energy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuit, Power and Computing Technologies (ICCPCT), 2014 International Conference on
  • Print_ISBN
    978-1-4799-2395-3
  • Type

    conf

  • DOI
    10.1109/ICCPCT.2014.7054931
  • Filename
    7054931