DocumentCode :
2971840
Title :
Short-time instantaneous frequency and bandwidth features for speech recognition
Author :
Tsiakoulis, Pirros ; Potamianos, Alexandros ; Dimitriadis, Dimitrios
Author_Institution :
Sch. of Electr. & Comput. Eng., Nat. Tech. Univ. of Athens, Athens, Greece
fYear :
2009
fDate :
Nov. 13 2009-Dec. 17 2009
Firstpage :
103
Lastpage :
106
Abstract :
In this paper, we investigate the performance of modulation related features and normalized spectral moments for automatic speech recognition. We focus on the short-time averages of the amplitude weighted instantaneous frequencies and bandwidths, computed at each subband of a mel-spaced filterbank. Similar features have been proposed in previous studies, and have been successfully combined with MFCCs for speech and speaker recognition. Our goal is to investigate the stand-alone performance of these features. First, it is experimentally shown that the proposed features are only moderately correlated in the frequency domain, and, unlike MFCCs, they do not require a transformation to the cepstral domain. Next, the filterbank parameters (number of filters and filter overlap) are investigated for the proposed features and compared with those of MFCCs. Results show that frequency related features perform at least as well as MFCCs for clean conditions, and yield superior results for noisy conditions; up to 50% relative error rate reduction for the AURORA3 Spanish task.
Keywords :
filtering theory; speaker recognition; automatic speech recognition; mel-spaced filterbank; normalized spectral moments; short-time instantaneous bandwidth features; short-time instantaneous frequency features; speaker recognition; Amplitude estimation; Amplitude modulation; Bandwidth; Feature extraction; Filter bank; Frequency domain analysis; Frequency estimation; Frequency modulation; Resonance; Speech recognition; AM-FM; filterbank overlap; instantaneous bandwidth; instantaneous frequency; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
Type :
conf
DOI :
10.1109/ASRU.2009.5373305
Filename :
5373305
Link To Document :
بازگشت