Title :
Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition
Author :
Tan, Zheng-Hua ; Dalsgaard, Paul ; Lindberg, Børge
Author_Institution :
Inst. of Electron. Syst., Aalborg Univ.
fDate :
5/1/2007 12:00:00 AM
Abstract :
In this paper, the temporal correlation of speech is exploited in front-end feature extraction, client-based error recovery, and server-based error concealment (EC) for distributed speech recognition. First, the paper investigates a half frame rate (HFR) front-end that uses double frame shifting at the client side. At the server side, each HFR feature vector is duplicated to construct a full frame rate (FFR) feature sequence. This HFR front-end gives comparable performance to the FFR front-end but contains only half the FFR features. Second, different arrangements of the other half of the FFR features creates a set of error recovery techniques encompassing multiple description coding and interleaving schemes where interleaving has the advantage of not introducing a delay when there are no transmission errors. Third, a subvector-based EC technique is presented where error detection and concealment is conducted at the subvector level as opposed to conventional techniques where an entire vector is replaced even though only a single bit error occurs. The subvector EC is further combined with weighted Viterbi decoding. Encouraging recognition results are observed for the proposed techniques. Lastly, to understand the effects of applying various EC techniques, this paper introduces three approaches consisting of speech feature, dynamic programming distance, and hidden Markov model state duration comparison
Keywords :
Viterbi decoding; client-server systems; dynamic programming; feature extraction; hidden Markov models; interleaved codes; speech coding; speech recognition; voice communication; Viterbi decoding; bandwidth flexible distributed speech recognition; client-based error recovery; double frame shifting; dynamic programming distance; error recovery techniques; front-end feature extraction; half frame rate; hidden Markov model; interleaving schemes; multiple description coding; server-based error concealment; speech feature; speech temporal correlation; subvector-based EC technique; Bandwidth; Decoding; Delay; Dynamic programming; Feature extraction; Hidden Markov models; Interleaved codes; Robustness; Speech recognition; Viterbi algorithm; Distributed speech recognition (DSR); error concealment (EC); error recovery; low bit-rate; split vector quantization (SVQ);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.889799