HMM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress

Author

Bou-Ghazale, Sahar E. ; Hansen, John H L

Author_Institution

Robust Speech Processing Lab., Duke Univ., Durham, NC, USA

Volume

6

Issue

3

fYear

1998

fDate

5/1/1998 12:00:00 AM

Firstpage

201

Lastpage

216

Abstract

A novel approach is proposed for modeling speech parameter variations between neutral and stressed conditions and employed in a technique for stressed speech synthesis and recognition. The proposed method consists of modeling the variations in pitch contour, voiced speech duration, and average spectral structure using hidden Markov models (HMMs). While HMMs have traditionally been used for recognition applications, here they are employed to statistically model the characteristics needed for generating pitch contour and spectral perturbation contour patterns to modify the speaking style of isolated neutral words. The proposed HMM models are both speaker and word-independent, but unique to each speaking style. While the modeling scheme is applicable to a variety of stress and emotional speaking styles, the evaluations presented focus on angry speech, the Lombard (1911) effect, and loud spoken speech in three areas. First, formal subjective listener evaluations of the modified speech confirm the HMMs ability to capture the parameter variations under stressed conditions. Second, an objective evaluation using a separately formulated stress classifier is employed to assess the presence of stress imparted on the synthetic speech. Finally, the stressed speech is also used for training and shown to measurably improve the performance of an HMM-based stressed speech recognizer

Keywords

hidden Markov models; spectral analysis; speech recognition; speech synthesis; statistical analysis; HMM; HMM-based stressed speech modeling; Lombard effect; angry speech; average spectral structure; emotional speaking styles; formal subjective listener evaluations; hidden Markov models; isolated speech; loud spoken speech; neutral conditions; objective evaluation; pitch contour; speaker-independent model; spectral perturbation contour patterns; speech parameter variations; statistical model; stress classifier; stressed conditions; stressed speech recognition; stressed speech synthesis; training; voiced speech duration; word-independent model; Character recognition; Hidden Markov models; Laboratories; Pattern recognition; Robustness; Speech analysis; Speech processing; Speech recognition; Speech synthesis; Stress;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.668815

Filename

668815