Abstract :
Amplitude modulation (AM) and frequency modulation (FM) have been well defined and studied in the context of communications systems(S. Haykin, 1994). Borrowing upon these ideas, several researchers have applied AM-FM (V. Tyagi et al., 2003, M. Athineos et al., 2004, Q. Zhu and A. Alwan, 2000, B.E.D. Kingsbury et al., 1998) modeling for speech signals with mixed results. These techniques have varied in their definition and consequently the demodulation methods used therein. In this paper, we carefully define AM and FM signals in the context of ASR. We show that for a theoretically meaningful estimation of the AM signal, it is necessary to decompose the speech signal into several narrow spectral bands as opposed to the previous use of the speech modulation spectrum (V. Tyagi et al., 2003, M. Athineos et al., 2004, Q. Zhu and A. Alwan, 2000, B.E.D. Kingsbury et al., 1998), which was derived by decomposing the speech signal into increasingly wider spectral bands (such as critical, Bark or Mel). Due to the Hilbert relationships, the AM signal induces a component in the FM signal which is fully determinable from the AM signal (R. Kumaresan and A. Rao, 1999, V. Tyagi and C. Wellekens, 2005). We present a novel homomorphic filtering technique to extract the leftover FM signal after suppressing the redundant part of the FM signal. The estimated AM message signals are downsampled and their lower DCT coefficients are retained as speech features. These features carry information that is complementary to the MFCCs. A Tandem (H. Hermansky, 2003) combination of these two features is shown to improve recognition accuracy
Keywords :
demodulation; discrete cosine transforms; filtering theory; signal sampling; speech recognition; AM signal; ASR; Bark spectral band; DCT coefficients; FM signal; Hilbert relationships; Mel spectral band; Tandem combination; amplitude modulation; carrier signal decomposition; communications systems; critical spectral band; demodulation methods; fepstrum signal decomposition; frequency modulation; homomorphic filtering technique; narrow spectral band; speech modulation spectrum; speech signals; Amplitude modulation; Automatic speech recognition; Context; Data mining; Demodulation; Discrete cosine transforms; Estimation theory; Filtering; Frequency modulation; Signal resolution;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on