Title :
Robust Distant Speech Recognition by Combining Position-Dependent CMN with Conventional CMN
Author :
Longbiao Wang ; Kitaoka, Norihide ; Nakagawa, Sachiko
Author_Institution :
Dept. of Inf. & Compute Sci., Toyohashi Univ. of Technol., Japan
Abstract :
We proposed an environmentally robust speech recognition method based on position-dependent cepstral mean normalization (PD-CMN) to compensate for channel distortion depending on speaker position. PDCMN can efficiently compensate for the channel transmission characteristics while it cannot normalize speaker variation because position-dependent cepstral mean does not contain speaker characteristics. Conventional CMN can compensate for the speaker variation while it cannot obtain good recognition performance for short utterances. In this paper, we propose a robust distant speech recognition by combining position-dependent CMN with the conventional CMN to address the above problems. The position-dependent cepstral mean is linearly combined with conventional cepstral mean with following two types of processing. The first method is to use a fixed weighting coefficient over whole test data to obtain the combinational CMN, which is called fixed-weight combinational CMN. The second method is to calculate the output probability of multiple features compensated by a variable weighting coefficient at each frame, and a single decoder using these output probabilities is used to perform speech recognition, which is called variable-weight combinational CMN. We conducted the experiments of our proposed method using small vocabulary (100 words) distant isolated word recognition in a real environment. The proposed variable-weight combinational CMN method achieved a relative error reduction rate of 56.3% from conventional CMN and 22.2% from PDCMN, respectively.
Keywords :
combinatorial mathematics; speaker recognition; channel distortion; channel transmission characteristics; fixed-weight combinational CMN; position-dependent CMN; position-dependent cepstral mean normalization; robust distant speech recognition; speaker position; variable weighting coefficient; Automatic speech recognition; Cepstral analysis; Decoding; Microphones; Probability; Robustness; Safety; Speech recognition; Testing; Vocabulary; Robust speech recognition; conventional CMN; distant-talking environments; multiple microphone processing; position-dependent CMN;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367038