• DocumentCode
    178412
  • Title

    Dialogue context sensitive HMM-based speech synthesis

  • Author

    Tsiakoulis, Pirros ; Breslin, C. ; Gasic, M. ; Henderson, Mike ; Dongho Kim ; Szummer, M. ; Thomson, B. ; Young, Stephanie

  • Author_Institution
    Eng. Dept., Univ. of Cambridge, Cambridge, UK
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    2554
  • Lastpage
    2558
  • Abstract
    The focus of this work is speech synthesis tailored to the needs of spoken dialogue systems. More specifically, the framework of HMM-based speech synthesis is utilized to train an emphatic voice that also considers dialogue context for decision tree state clustering. To achieve this, we designed and recorded a speech corpus comprising system prompts from human-computer interaction, as well as additional prompts for slot-level emphasis. This corpus, combined with a general purpose text-to-speech one, was used to train voices using a) baseline context features, b) additional emphasis features, and c) additional dialogue context features. Both emphasis and dialogue context features are extracted from the dialogue act semantic representation. The voices were evaluated in pairs for dialogue appropriateness using a preference listening test. The results show that the emphatic voice is preferred to the baseline when emphasis markup is present, while the dialogue context-sensitive voice is preferred to the plain emphatic one when no emphasis markup is present and preferable to the baseline in both cases. This demonstrates that including dialogue context features for decision tree state clustering significantly improves the quality of the synthetic voice for dialogue.
  • Keywords
    context-sensitive grammars; context-sensitive languages; decision trees; hidden Markov models; human computer interaction; interactive systems; natural language interfaces; natural language processing; speech synthesis; HMM-based speech synthesis; decision tree state clustering; dialogue act semantic representation; dialogue context features; dialogue context sensitive; emphasis features; emphatic voice; human-computer interaction; slot-level emphasis; speech corpus; spoken dialogue systems; text-to-speech; Context; Decision trees; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Training; HMM-based speech synthesis; dialogue context-sensitive speech synthesis; emphatic speech synthesis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854061
  • Filename
    6854061