• DocumentCode
    1325042
  • Title

    Reliable Pitch Marking of Affective Speech at Peaks or Valleys Using Restricted Dynamic Programming

  • Author

    Alías, Francesc ; Munné, Natália

  • Author_Institution
    Grup de Recerca en Tecnologies Media, La Salle (Univ. Ramon Llull), Barcelona, Spain
  • Volume
    12
  • Issue
    6
  • fYear
    2010
  • Firstpage
    481
  • Lastpage
    489
  • Abstract
    The affective communication channel plays a key role in multimodal human-computer interaction. In this context, the generation of realistic talking-heads expressing emotions both in appearance and speech is of great interest. The synthetic speech of talking-heads is generally obtained from a text-to-speech (TTS) synthesizer. One of the dominant techniques for achieving high-quality synthetic speech is unit-selection TTS (US-TTS) synthesis. Affective US-TTS systems are driven by affective annotated speech databases. Since affective speech involves higher acoustic variability than neutral speech, achieving trustworthy speech labeling is a more challenging task. To that effect, this paper introduces a methodology for achieving reliable pitch marking on affective speech. The proposal adjusts the pitch marks at the signal peaks or valleys after applying a three-stage restricted dynamic programming algorithm. The methodology can be applied as a post-processing of any pitch determination and pitch marking algorithm (with any local criterion for locating pitch marks), or their merging. The experiments show that the proposed methodology significantly improves the results of the input state-of-the-art markers on affective speech.
  • Keywords
    dynamic programming; human computer interaction; speech synthesis; US-TTS systems; affective communication channel; affective speech; multimodal human-computer interaction; neutral speech; reliable pitch marking; restricted dynamic programming algorithm; text-to-speech synthesizer; trustworthy speech labeling; Algorithm design and analysis; Databases; Dynamic programming; Heuristic algorithms; Labeling; Proposals; Speech; Affective speech; dynamic programming; pitch marking; speech analysis; unit-selection text-to-speech synthesis;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2010.2051873
  • Filename
    5571903