Title :
Nonlinear feature based classification of speech under stress
Author :
Zhou, Guojun ; Hansen, John H L ; Kaiser, James F.
Author_Institution :
Robust Speech Process. Lab., Colorado Univ., Boulder, CO, USA
fDate :
3/1/2001 12:00:00 AM
Abstract :
Studies have shown that variability introduced by stress or emotion can severely reduce speech recognition accuracy. Techniques for detecting or assessing the presence of stress could help improve the robustness of speech recognition systems. Although some acoustic variables derived from linear speech production theory have been investigated as indicators of stress, they are not always consistent. Three new features derived from the nonlinear Teager (1980) energy operator (TEO) are investigated for stress classification. It is believed that the TEO based features are better able to reflect the nonlinear airflow structure of speech production under adverse stressful conditions. The features proposed include TEO-decomposed FM variation (TEO-FM-Var), normalized TEO autocorrelation envelope area (TEO-Auto-Env), and critical band based TEO autocorrelation envelope area (TEO-CB-Auto-Env). The proposed features are evaluated for the task of stress classification using simulated and actual stressed speech and it is shown that the TEO-CB-Auto-Env feature outperforms traditional pitch and mel-frequency cepstrum coefficients (MFCC) substantially. Performance for TEO based features are maintained in both text-dependent and text-independent models, while performance of traditional features degrades in text-independent models. Overall neutral versus stress classification rates are also shown to be more consistent across different stress styles
Keywords :
cepstral analysis; correlation methods; mathematical operators; signal classification; speech recognition; TEO autocorrelation envelope area; TEO-decomposed FM variation; acoustic variables; actual stressed speech; critical band; emotion; linear speech production theory; mel-frequency cepstrum coefficients; nonlinear Teager energy operator; nonlinear airflow structure; nonlinear feature based speech classification; normalized TEO autocorrelation envelope area; simulated stressed speech; speech production; speech recognition systems; stress classification rates; text-dependent models; text-independent models; Acoustic noise; Autocorrelation; Laboratories; Multitasking; Robustness; Speech analysis; Speech processing; Speech recognition; Stress; Working environment noise;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on