مرکز منطقه ای اطلاع رساني علوم و فناوري - Formant estimation algorithm based on pole focusing offering improved noise tolerance and feature resolution

Abstract :

The ability to measure the centre frequencies of areas of resonance (formants) in the short-time power spectrum of speech is of paramount importance in the recognition of voiced speech sounds in a feature-extraction-based continuous speech recognition system. Additionally, the provision of a tracking algorithm, by which the loci of formants with respect to time can be estimated, yields formant transition information which helps identify phonetic features which are of short duration. Noise robustness in formant estimation is an essential attribute for recognition systems which are used in the office environment and in military applications. The novel technique presented in the paper provides a noise-robust method of extracting formant centre-frequency information from the short-time speech spectrum, and consequently improves the signal/noise performance of the associated formant tracking algorithm. Formant estimation is based on modelling the vocal tract frequency response using linear prediction coding (LPC) techniques. However, the estimation of formant centre frequency in any given analysis frame is greatly improved by employing off-axis spectral estimation coupled with a progressive increase in vocal tract model order, which together provide vocal tract pole enhancement. Finally, the use of a formant weighting filter function applied within each frame aids in conferring high noise immunity to the estimation process. The pole focusing technique is shown to offer an improvement of at least 14 dB in signal/noise immunity as a formant frequency estimator over conventional LPC-based spectral estimation. In its application to formant tracking, it is shown that the technique also offers improved separation of formants which tend to merge, besides offering a general improvement in the provision of formant detail, in particular with regard to weak nasal formants. An additional advantage of the technique is its relative insensitivity to choice of vocal tract model order, - hich produces an inherently speaker-independent formant estimation algorithm.

Keywords :

speech recognition; centre frequencies; feature resolution; feature-extraction-based continuous speech recognition system; formant estimation algorithm; formant transition information; improved noise tolerance; military applications; office environment; phonetic features; pole focusing; recognition systems; short-time power spectrum; signal/noise performance; speech; speech recognition; tracking algorithm; voiced speech sounds;