DocumentCode :
2790404
Title :
Comparison of modulation features for phoneme recognition
Author :
Ganapathy, Sriram ; Thomas, Samuel ; Hermansky, Hynek
Author_Institution :
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5038
Lastpage :
5041
Abstract :
In this paper, we compare several approaches for the extraction of modulation frequency features from speech signal using a phoneme recognition system. The general framework in these approaches is to decompose the speech signal into a set of sub-bands. Amplitude modulations (AM) in the sub-band signal are used to derive features for automatic speech recognition (ASR). Then, we propose a feature extraction technique which uses autoregressive models (AR) of sub-band Hilbert envelopes in relatively long segments of speech signal. AR models of Hilbert envelopes are derived using frequency domain linear prediction (FDLP). Features are formed by converting the FDLP envelopes into static and dynamic modulation frequency components. In the phoneme recognition experiments using the TIMIT database, the FDLP based modulation frequency features provide significant improvements compared to other techniques (average relative improvement of 7.5% over the base-line features). Furthermore, a detailed analysis is performed to determine the relative contribution of various processing stages in the proposed technique.
Keywords :
acoustic signal processing; amplitude modulation; autoregressive processes; frequency modulation; speech recognition; amplitude modulations; automatic speech recognition; autoregressive models; frequency domain linear prediction; modulation frequency; phoneme recognition; speech signal; sub-band Hilbert envelopes; sub-band signal; Amplitude modulation; Automatic speech recognition; Feature extraction; Frequency conversion; Frequency domain analysis; Frequency modulation; Performance analysis; Predictive models; Spatial databases; Speech recognition; Feature Extraction; Frequency domain linear prediction (FDLP); Modulations; Phoneme recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5495057
Filename :
5495057
Link To Document :
بازگشت