Title :
Nonstationary speech analysis using neural prediction
Author :
Zhu, Bin ; Micheli-Tzanakou, Evangelia
Author_Institution :
Dept. of Biomed. Eng., Rutgers Univ., Piscataway, NJ, USA
Abstract :
Discusses extracting dynamic features of individual speakers from short speech segments for robust recognition. As a method of homomorphic signal processing, cepstrum analysis can separate the excitation and impulse response of the vocal channels when applied to speech signals. For short-time spectrum analysis, overlapping windows are used to divided speech into many frames. For each window, one cepstrum vector is obtained. It is assumed that for each frame (about 30 msec), the speech signal is stationary. However, the speech is basically nonstationary for long time intervals. Therefore, one must consider the dynamic changes between frames. Conventional methods often use only the static features of the short-time cepstrum. A neural network can be seen as a nonlinear dynamic system, which may express both the static and dynamic features of the signal at hand. For this purpose, a neural prediction network was designed to extract the inter- and intraframe correlations of cepstrum vectors, so as to obtain the robust features of individual speakers from very short speech epochs
Keywords :
feature extraction; medical signal processing; neural nets; spectral analysis; speech processing; 30 ms; homomorphic signal processing method; individual speakers; interframe correlations; intraframe correlations; neural prediction; nonlinear dynamic system; nonstationary speech analysis; short-time cepstrum; static features; stationary speech signal; very short speech epochs; vocal channels impulse response; Cepstral analysis; Cepstrum; Feature extraction; Nonlinear dynamical systems; Robustness; Signal analysis; Signal processing; Speech analysis; Speech processing; Speech recognition;
Journal_Title :
Engineering in Medicine and Biology Magazine, IEEE