An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition

Author

Zhu, Qifeng ; Alwan, Abeer

Author_Institution

Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA

Volume

1

fYear

2001

fDate

2001

Firstpage

113

Abstract

A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, runlength and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral features from clean speech and testing used the same features after coding and decoding with 2D DCT and entropy coding and in various levels of acoustic noise. The coding scheme results in recognition performance comparable to that obtained with unquantized features at low bitrates. 2D DCT coding of MFCCs (mel-frequency cepstral coefficients) together with a method for variable frame rate analysis (Zhu and Alwan, 2000) and peak isolation (Strope and Alwan, 1997) maintains the noise robustness of these algorithms at low SNRs even at 624 bps. The low-complexity scheme is scalable resulting in graceful degradation in performance with decreasing bit rate

Keywords

Huffman codes; acoustic noise; cepstral analysis; data compression; decoding; discrete cosine transforms; entropy codes; runlength codes; speech coding; speech recognition; transform coding; 624 bit/s; Huffman coding; MFCCs; acoustic features; acoustic noise; clean speech; decoding; digit recognition experiments; efficient scalable 2D DCT-based feature coding scheme; entropy coding; feature vectors; low-complexity scheme; mel-frequency cepstral coefficients; noise robustness; peak isolation; recognition performance; remote speech recognition; runlength coding; uniform scalar quantization; unquantized cepstral features; variable frame rate analysis; Acoustic applications; Acoustic noise; Bit rate; Cepstral analysis; Discrete cosine transforms; Huffman coding; Mel frequency cepstral coefficient; Quantization; Speech enhancement; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on

Conference_Location

Salt Lake City, UT

ISSN

1520-6149

Print_ISBN

0-7803-7041-4

Type

conf

DOI

10.1109/ICASSP.2001.940780

Filename

940780