Title :
MMSE-Based Packet Loss Concealment for CELP-Coded Speech Recognition
Author :
Carmona, José L. ; Peinado, Antonio M. ; Pérez-Córdoba, José L. ; Gómez, Angel M.
Author_Institution :
Dept. de Teor. de la Senal, Telematica y Comun., Univ. de Granada, Granada, Spain
Abstract :
In this paper, we analyze the performance of network speech recognition (NSR) over IP networks, adapting and proposing new solutions to the packet loss problem for code excited linear prediction (CELP) codecs. NSR has a client-server architecture which places the recognizer at the server side using a standard speech codec for speech transmission. Its main advantage is that no changes are required for the existing client devices and networks. However, the use of speech codecs degrades its performance, mainly in the presence of packet losses. First, we study the degradations introduced by CELP codecs in lossy packet networks. Later, we propose a reconstruction technique based on minimum mean square error (MMSE) estimation using hidden Markov models. This approach also allows us to obtain reliability measures associated to each estimate. We show how to use this information to improve the recognition performance by means of soft-data decoding and weighted Viterbi algorithm. The experimental results are obtained for two well-known CELP codecs, G.729 and AMR 12.2 kbps, carrying out recognition from decoded speech. Finally, we analyze an efficient and improved implementation of the proposed techniques using an NSR system which extracts speech recognition features directly from the bit-stream parameters. The experimental results show that the different proposed NSR systems achieve a comparable performance to distributed speech recognition (DSR).
Keywords :
client-server systems; least mean squares methods; speech coding; speech recognition; CELP codecs; CELP-coded speech recognition; IP networks; MMSE estimation; MMSE-based packet loss concealment; client-server architecture; code excited linear prediction; distributed speech recognition; hidden Markov models; minimum mean square error; network speech recognition; speech transmission; Code excited linear prediction (CELP); minimum mean square error (MMSE) estimation; network speech recognition (NSR); packet loss; soft-data; weighted Viterbi algorithm;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2033891