DocumentCode
417198
Title
Enhanced standard compliant distributed speech recognition (Aurora encoder) using rate allocation
Author
Srinivasamurthy, Naveen ; Ortega, Antonio ; Narayanan, Shrikanth
Author_Institution
Integrated Media Syst. Center, Univ. of Southern California, Los Angeles, CA, USA
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
The paper proposes modifications to improve the recognition performance obtainable by the ETSI standard distributed speech recognition encoder, Aurora (ES 201 108, 2000). The proposed modifications are standard compliant, i.e., they require no algorithmic modifications to the Aurora operation. Performance improvements are achieved by distributing the available bit budget among Aurora´s seven (different) 2-dimension vector quantizers (VQs) more efficiently. Improved bit-allocation to the different sub-vectors is achieved by incorporating the importance for recognition of each of the sub-vectors into the bit-allocation algorithm. The available bits are efficiently distributed among the sub-vectors by allocating a larger fraction of the available bits to the more important sub-vectors and hence maximizing recognition accuracy. The proposed bit-allocation algorithm is based on a novel mutual information (MI) measure. The MI measure quantifies the information content between a sub-vector and the class label and hence is a good indicator of the importance of the coefficient for recognition. It is shown that the proposed MI based method outperforms both the standard Aurora encoder and an encoder designed using traditional mean square error based bit-allocation. For the TIDIGITS connected digits recognition task, a 15.2% relative decrease in word error rate (WER) is possible with the proposed modified MI based Aurora encoder when compared to the recognition performance achieved using the standard Aurora encoder.
Keywords
error statistics; optimisation; speech coding; speech recognition; vector quantisation; vocoders; Aurora encoder; WER; bit-allocation; connected digits recognition; enhanced distributed speech recognition encoder; mean square error; mutual information measure; rate allocation; recognition accuracy maximization; standard compliant distributed speech recognition encoder; vector quantizers; word error rate; Bandwidth; Cellular phones; Degradation; Error analysis; Mean square error methods; Mutual information; Personal digital assistants; Speech recognition; Systems engineering and theory; Telecommunication standards;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326028
Filename
1326028
Link To Document