Title :
On 450-600 b/s natural sounding speech coding
Author :
Cheng, Yan-Ming ; O´Shaughnessy, Douglas
Author_Institution :
INRS Telecommun., Verdun, Que., Canada
fDate :
4/1/1993 12:00:00 AM
Abstract :
Algorithms for encoding speech with good intelligence and naturalness at very low rates are studied. Naturalness is retained by encoding accurately the speech excitation information from an LPC (linear predictive coding) model. A glottal ARX (autoregressive with exogenous input) technique is used to model the speech signal for high quality. A large reduction in coding rate is achieved through short-term temporal compression of the speech and vector quantization. Application of traditional vector quantization to the temporal decomposition output is discussed, with consideration of distortion measures and codebook generation. Based on properties of short-term temporal decomposition, finite-state vector quantization is introduced to further decrease the coding rate. A problem associated with this technique, estimation of a state transition matrix with incomplete data, is treated. The general result is that practical coders operating in a range of 450-600 b/s with a delay of about 200 ms and natural-sounding output speech can be designed
Keywords :
linear predictive coding; speech coding; vector quantisation; 450 to 600 bit/s; LPC model; autoregressive with exogenous input; codebook generation; coding rate; distortion measures; finite-state vector quantization; glottal ARX technique; linear predictive coding; natural-sounding output speech; short-term temporal compression; speech coding; speech compression; state transition matrix; vector quantization; very low bit rate; Bit rate; Distortion measurement; Encoding; Linear predictive coding; Speech analysis; Speech coding; Speech synthesis; Testing; Time measurement; Vector quantization;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on