Speaker normalization based on the generalized time-frequency representation and Mellin transform

Author

Dongmei, Jiang ; Rongchun, Zhao

Author_Institution

Dept. of Comput. Sci. & Eng., Northwestern Polytech. Univ., Xian, China

Volume

2

fYear

2000

fDate

2000

Firstpage

782

Abstract

For vocal tract length normalization in speaker-independent speech recognition, a novel feature extraction method is carried out on the generalized time-frequency representation with cone-shaped kernel (CK-GTFR) and Mellin transform. The GTFR is superior to other representations in suppressing cross terms and producing good time and frequency resolution simultaneously. Mellin transform makes the features insensitive to different vocal tract lengths. F-ratio tests show that features in this paper have the highest separation ability compared to the FFT cepstrum or FFT-Mellin cepstrum, and are superior to the Mel cepstrum in most cases

Keywords

feature extraction; signal representation; speech recognition; time-frequency analysis; transforms; F-ratio tests; FFT cepstrum; FFT-Mellin cepstrum; Mel cepstrum; Mellin transform; cone-shaped kernel; cross terms suppression; feature extraction method; frequency resolution; generalized time-frequency representation; speaker normalization; speaker-independent speech recognition; time resolution; vocal tract length normalization; vocal tract lengths; Cepstrum; Computer science; Feature extraction; Fourier transforms; Kernel; Loudspeakers; Signal resolution; Speech recognition; Testing; Time frequency analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Signal Processing Proceedings, 2000. WCCC-ICSP 2000. 5th International Conference on

Conference_Location

Beijing

Print_ISBN

0-7803-5747-7

Type

conf

DOI

10.1109/ICOSP.2000.891628

Filename

891628