• DocumentCode
    3340286
  • Title

    An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition

  • Author

    Zhu, Qifeng ; Alwan, Abeer

  • Author_Institution
    Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    113
  • Abstract
    A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, runlength and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral features from clean speech and testing used the same features after coding and decoding with 2D DCT and entropy coding and in various levels of acoustic noise. The coding scheme results in recognition performance comparable to that obtained with unquantized features at low bitrates. 2D DCT coding of MFCCs (mel-frequency cepstral coefficients) together with a method for variable frame rate analysis (Zhu and Alwan, 2000) and peak isolation (Strope and Alwan, 1997) maintains the noise robustness of these algorithms at low SNRs even at 624 bps. The low-complexity scheme is scalable resulting in graceful degradation in performance with decreasing bit rate
  • Keywords
    Huffman codes; acoustic noise; cepstral analysis; data compression; decoding; discrete cosine transforms; entropy codes; runlength codes; speech coding; speech recognition; transform coding; 624 bit/s; Huffman coding; MFCCs; acoustic features; acoustic noise; clean speech; decoding; digit recognition experiments; efficient scalable 2D DCT-based feature coding scheme; entropy coding; feature vectors; low-complexity scheme; mel-frequency cepstral coefficients; noise robustness; peak isolation; recognition performance; remote speech recognition; runlength coding; uniform scalar quantization; unquantized cepstral features; variable frame rate analysis; Acoustic applications; Acoustic noise; Bit rate; Cepstral analysis; Discrete cosine transforms; Huffman coding; Mel frequency cepstral coefficient; Quantization; Speech enhancement; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7041-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2001.940780
  • Filename
    940780