DocumentCode
3340286
Title
An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition
Author
Zhu, Qifeng ; Alwan, Abeer
Author_Institution
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
Volume
1
fYear
2001
fDate
2001
Firstpage
113
Abstract
A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, runlength and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral features from clean speech and testing used the same features after coding and decoding with 2D DCT and entropy coding and in various levels of acoustic noise. The coding scheme results in recognition performance comparable to that obtained with unquantized features at low bitrates. 2D DCT coding of MFCCs (mel-frequency cepstral coefficients) together with a method for variable frame rate analysis (Zhu and Alwan, 2000) and peak isolation (Strope and Alwan, 1997) maintains the noise robustness of these algorithms at low SNRs even at 624 bps. The low-complexity scheme is scalable resulting in graceful degradation in performance with decreasing bit rate
Keywords
Huffman codes; acoustic noise; cepstral analysis; data compression; decoding; discrete cosine transforms; entropy codes; runlength codes; speech coding; speech recognition; transform coding; 624 bit/s; Huffman coding; MFCCs; acoustic features; acoustic noise; clean speech; decoding; digit recognition experiments; efficient scalable 2D DCT-based feature coding scheme; entropy coding; feature vectors; low-complexity scheme; mel-frequency cepstral coefficients; noise robustness; peak isolation; recognition performance; remote speech recognition; runlength coding; uniform scalar quantization; unquantized cepstral features; variable frame rate analysis; Acoustic applications; Acoustic noise; Bit rate; Cepstral analysis; Discrete cosine transforms; Huffman coding; Mel frequency cepstral coefficient; Quantization; Speech enhancement; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
Conference_Location
Salt Lake City, UT
ISSN
1520-6149
Print_ISBN
0-7803-7041-4
Type
conf
DOI
10.1109/ICASSP.2001.940780
Filename
940780
Link To Document