مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparison of modulation features for phoneme recognition

DocumentCode :

2790404

Title :

Comparison of modulation features for phoneme recognition

Author :

Ganapathy, Sriram ; Thomas, Samuel ; Hermansky, Hynek

Author_Institution :

Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2010

fDate :

14-19 March 2010

Firstpage :

5038

Lastpage :

5041

Abstract :

In this paper, we compare several approaches for the extraction of modulation frequency features from speech signal using a phoneme recognition system. The general framework in these approaches is to decompose the speech signal into a set of sub-bands. Amplitude modulations (AM) in the sub-band signal are used to derive features for automatic speech recognition (ASR). Then, we propose a feature extraction technique which uses autoregressive models (AR) of sub-band Hilbert envelopes in relatively long segments of speech signal. AR models of Hilbert envelopes are derived using frequency domain linear prediction (FDLP). Features are formed by converting the FDLP envelopes into static and dynamic modulation frequency components. In the phoneme recognition experiments using the TIMIT database, the FDLP based modulation frequency features provide significant improvements compared to other techniques (average relative improvement of 7.5% over the base-line features). Furthermore, a detailed analysis is performed to determine the relative contribution of various processing stages in the proposed technique.

Keywords :

acoustic signal processing; amplitude modulation; autoregressive processes; frequency modulation; speech recognition; amplitude modulations; automatic speech recognition; autoregressive models; frequency domain linear prediction; modulation frequency; phoneme recognition; speech signal; sub-band Hilbert envelopes; sub-band signal; Amplitude modulation; Automatic speech recognition; Feature extraction; Frequency conversion; Frequency domain analysis; Frequency modulation; Performance analysis; Predictive models; Spatial databases; Speech recognition; Feature Extraction; Frequency domain linear prediction (FDLP); Modulations; Phoneme recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on

Conference_Location :

Dallas, TX

ISSN :

1520-6149

Print_ISBN :

978-1-4244-4295-9

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2010.5495057

Filename :

5495057

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2790404