Title :
Improving the performance of HMM-based very low bit rate speech coding
Author :
Hoshiya, Takahiro ; Sako, Shinji ; Zen, Heiga ; Tokuda, Keiichi ; Masuko, Takashi ; Kobayashi, Takao ; Tadashi Kitantura
Author_Institution :
Dept. of Comput. Sci., Nagoya Inst. of Technol., Japan
Abstract :
In this paper, we define an F0 quantization scheme for a very low bit rate speech coder based on HMM (hidden Markov model). In the coding system, the encoder carries out phoneme recognition, and transmits phoneme indices, state durations and F0 information to the decoder. In the decoder, phoneme HMM are concatenated according to the phoneme indices, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM. Finally we obtain synthetic speech by using the MLSA (mel log spectrum approximation) filter according to the mel-cepstral coefficients and F0 information. In addition to the F0 quantization, we investigate encoding methods for other parameters to reduce the bit rate, yet keeping the subjective speech quality. A subjective listening test shows that the performance of the proposed coder at about 100∼150 bit/s is superior to a VQ-based vocoder at 600 bit/s (mel-cepstrum: 6 bit/frame×50 frame/s, F0: 6 bit/frame×50 frame/s).
Keywords :
cepstral analysis; decoding; hidden Markov models; speech coding; speech recognition; speech synthesis; vocoders; F0 quantization scheme; MLSA filter; decoder; encoder; encoding methods; hidden Markov model; mel log spectrum approximation; mel-cepstral coefficient vectors; performance; phoneme HMM concatenation; phoneme indices; phoneme recognition; speech coder; state durations; subjective listening test; subjective speech quality; synthetic speech; very low bit rate speech coding; Bit rate; Concatenated codes; Decoding; Encoding; Hidden Markov models; Information filtering; Information filters; Quantization; Speech coding; Testing;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
Print_ISBN :
0-7803-7663-3
DOI :
10.1109/ICASSP.2003.1198902