Enhanced standard compliant distributed speech recognition (Aurora encoder) using rate allocation

Author

Srinivasamurthy, Naveen ; Ortega, Antonio ; Narayanan, Shrikanth

Author_Institution

Integrated Media Syst. Center, Univ. of Southern California, Los Angeles, CA, USA

Volume

1

fYear

2004

fDate

17-21 May 2004

Abstract

The paper proposes modifications to improve the recognition performance obtainable by the ETSI standard distributed speech recognition encoder, Aurora (ES 201 108, 2000). The proposed modifications are standard compliant, i.e., they require no algorithmic modifications to the Aurora operation. Performance improvements are achieved by distributing the available bit budget among Aurora´s seven (different) 2-dimension vector quantizers (VQs) more efficiently. Improved bit-allocation to the different sub-vectors is achieved by incorporating the importance for recognition of each of the sub-vectors into the bit-allocation algorithm. The available bits are efficiently distributed among the sub-vectors by allocating a larger fraction of the available bits to the more important sub-vectors and hence maximizing recognition accuracy. The proposed bit-allocation algorithm is based on a novel mutual information (MI) measure. The MI measure quantifies the information content between a sub-vector and the class label and hence is a good indicator of the importance of the coefficient for recognition. It is shown that the proposed MI based method outperforms both the standard Aurora encoder and an encoder designed using traditional mean square error based bit-allocation. For the TIDIGITS connected digits recognition task, a 15.2% relative decrease in word error rate (WER) is possible with the proposed modified MI based Aurora encoder when compared to the recognition performance achieved using the standard Aurora encoder.

Keywords

error statistics; optimisation; speech coding; speech recognition; vector quantisation; vocoders; Aurora encoder; WER; bit-allocation; connected digits recognition; enhanced distributed speech recognition encoder; mean square error; mutual information measure; rate allocation; recognition accuracy maximization; standard compliant distributed speech recognition encoder; vector quantizers; word error rate; Bandwidth; Cellular phones; Degradation; Error analysis; Mean square error methods; Mutual information; Personal digital assistants; Speech recognition; Systems engineering and theory; Telecommunication standards;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-8484-9

Type

conf

DOI

10.1109/ICASSP.2004.1326028

Filename

1326028