DocumentCode :
3340286
Title :
An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition
Author :
Zhu, Qifeng ; Alwan, Abeer
Author_Institution :
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
Volume :
1
fYear :
2001
fDate :
2001
Firstpage :
113
Abstract :
A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, runlength and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral features from clean speech and testing used the same features after coding and decoding with 2D DCT and entropy coding and in various levels of acoustic noise. The coding scheme results in recognition performance comparable to that obtained with unquantized features at low bitrates. 2D DCT coding of MFCCs (mel-frequency cepstral coefficients) together with a method for variable frame rate analysis (Zhu and Alwan, 2000) and peak isolation (Strope and Alwan, 1997) maintains the noise robustness of these algorithms at low SNRs even at 624 bps. The low-complexity scheme is scalable resulting in graceful degradation in performance with decreasing bit rate
Keywords :
Huffman codes; acoustic noise; cepstral analysis; data compression; decoding; discrete cosine transforms; entropy codes; runlength codes; speech coding; speech recognition; transform coding; 624 bit/s; Huffman coding; MFCCs; acoustic features; acoustic noise; clean speech; decoding; digit recognition experiments; efficient scalable 2D DCT-based feature coding scheme; entropy coding; feature vectors; low-complexity scheme; mel-frequency cepstral coefficients; noise robustness; peak isolation; recognition performance; remote speech recognition; runlength coding; uniform scalar quantization; unquantized cepstral features; variable frame rate analysis; Acoustic applications; Acoustic noise; Bit rate; Cepstral analysis; Discrete cosine transforms; Huffman coding; Mel frequency cepstral coefficient; Quantization; Speech enhancement; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
Conference_Location :
Salt Lake City, UT
ISSN :
1520-6149
Print_ISBN :
0-7803-7041-4
Type :
conf
DOI :
10.1109/ICASSP.2001.940780
Filename :
940780
Link To Document :
بازگشت