• DocumentCode
    1937363
  • Title

    Preservation, identification, and use of emotion in a text-to-speech system

  • Author

    Eide, E.

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2002
  • fDate
    11-13 Sept. 2002
  • Firstpage
    127
  • Lastpage
    130
  • Abstract
    We are interested in the ability of a text-to-speech system to convey emotion. Towards that end, we examine the ability to preserve the underlying emotion present in a training corpus. We consider the emotions "lively", "sad", and "angry", and compare the ability to convey each of these in synthesized speech with the neutral speech baseline. We also look at the confusion rates encountered when listeners are asked to identify the emotional state in which the speaker appears to have been. We conclude from our experiments that a viable method for building a text-to-speech system which conveys a certain emotion is simply to collect data spoken in that emotional state. However, our experiments show that in order for us to achieve a given level of emotion perceived in the synthetic speech, we must record natural speech which has a higher level than that desired in the synthetic output. Finally, we discuss how an emotional TTS system might be used.
  • Keywords
    emotion recognition; speech recognition; speech synthesis; anger; confusion rates; data collection; emotion identification; emotion preservation; emotion use; emotional TTS system; liveliness; natural speech recording; sadness; synthesized speech; text-to-speech system; training corpus; Databases; Loudspeakers; Natural languages; Speech synthesis; System testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
  • Print_ISBN
    0-7803-7395-2
  • Type

    conf

  • DOI
    10.1109/WSS.2002.1224388
  • Filename
    1224388