• DocumentCode
    3433337
  • Title

    A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques

  • Author

    Tokuda, Keiichi ; Masuko, Takashi ; Hiroi, Jun ; Kobayashi, Takao ; Kitamura, Tadashi

  • Author_Institution
    Dept. of Comput. Sci., Nagoya Inst. of Technol., Japan
  • Volume
    2
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    609
  • Abstract
    This paper presents a very low bit rate speech coder based on HMM (hidden Markov model). The encoder carries out phoneme recognition, and transmits phoneme indexes, state durations and pitch information to the decoder. In the decoder, phoneme HMMs are concatenated according to the phoneme indexes, and a sequence of mel-cepstral coefficient vectors is generated from the concatenated HMM by using an ML-based speech parameter generation technique. Finally we obtain synthetic speech by exciting the MLSA (mel log spectrum approximation) filter, whose coefficients are given by mel-cepstral coefficients, according to the pitch information. A subjective listening test shows that the performance of the proposed coder at about 150 bit/s (for the test data including 26% silence region) is comparable to a VQ-based vocoder at 400 bit/s (=8 bit/frame×50 frame/s) without pitch quantization for both coders
  • Keywords
    cepstral analysis; hidden Markov models; speech coding; speech recognition; speech synthesis; vector quantisation; vocoders; 150 bit/s; HMM-based speech recognition; HMM-based speech synthesis; ML-based speech parameter generation technique; MLSA filter; VLBR; concatenated HMM; decoder; hidden Markov model; mel log spectrum approximation; mel-cepstral coefficient vectors; phoneme indexes transmission; phoneme recognition; pitch information transmission; state durations transmission; subjective listening test; synthetic speech; very low bit rate speech coder; Bit rate; Concatenated codes; Decoding; Hidden Markov models; Information filtering; Information filters; Quantization; Speech; Testing; Vocoders;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.675338
  • Filename
    675338