• DocumentCode
    1365266
  • Title

    HMM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress

  • Author

    Bou-Ghazale, Sahar E. ; Hansen, John H L

  • Author_Institution
    Robust Speech Processing Lab., Duke Univ., Durham, NC, USA
  • Volume
    6
  • Issue
    3
  • fYear
    1998
  • fDate
    5/1/1998 12:00:00 AM
  • Firstpage
    201
  • Lastpage
    216
  • Abstract
    A novel approach is proposed for modeling speech parameter variations between neutral and stressed conditions and employed in a technique for stressed speech synthesis and recognition. The proposed method consists of modeling the variations in pitch contour, voiced speech duration, and average spectral structure using hidden Markov models (HMMs). While HMMs have traditionally been used for recognition applications, here they are employed to statistically model the characteristics needed for generating pitch contour and spectral perturbation contour patterns to modify the speaking style of isolated neutral words. The proposed HMM models are both speaker and word-independent, but unique to each speaking style. While the modeling scheme is applicable to a variety of stress and emotional speaking styles, the evaluations presented focus on angry speech, the Lombard (1911) effect, and loud spoken speech in three areas. First, formal subjective listener evaluations of the modified speech confirm the HMMs ability to capture the parameter variations under stressed conditions. Second, an objective evaluation using a separately formulated stress classifier is employed to assess the presence of stress imparted on the synthetic speech. Finally, the stressed speech is also used for training and shown to measurably improve the performance of an HMM-based stressed speech recognizer
  • Keywords
    hidden Markov models; spectral analysis; speech recognition; speech synthesis; statistical analysis; HMM; HMM-based stressed speech modeling; Lombard effect; angry speech; average spectral structure; emotional speaking styles; formal subjective listener evaluations; hidden Markov models; isolated speech; loud spoken speech; neutral conditions; objective evaluation; pitch contour; speaker-independent model; spectral perturbation contour patterns; speech parameter variations; statistical model; stress classifier; stressed conditions; stressed speech recognition; stressed speech synthesis; training; voiced speech duration; word-independent model; Character recognition; Hidden Markov models; Laboratories; Pattern recognition; Robustness; Speech analysis; Speech processing; Speech recognition; Speech synthesis; Stress;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.668815
  • Filename
    668815