مرکز منطقه ای اطلاع رساني علوم و فناوري - Multi-resolution phonetic/segmental features and models for HMM-based speech recognition

DocumentCode :

310616

Title :

Multi-resolution phonetic/segmental features and models for HMM-based speech recognition

Author :

Vaseghi, Saeed ; Harte, Naomi ; Milner, B.

Author_Institution :

Queen´s Univ., Belfast, UK

Volume :

fYear :

1997

fDate :

21-24 April 1997

Firstpage :

1263

Abstract :

This paper explores the modelling of phonetic segments of speech with multi-resolution spectral/time correlates. For spectral representation a set of multi-resolution cepstral features are proposed. Cepstral features obtained from a DCT of the log energy-spectrum over the full voice-bandwidth (100-4000 Hz) are combined with higher resolution features obtained from the DCT of upper subband (say 100-2100) and lower subband (2100-4000) halves. This approach can be extended to several levels of different resolutions. For representation of the temporal structure of speech segments or phonetic units, the conventional cepstral and dynamic cepstral features representing speech at the sub-phonetic levels, are supplemented by a set of phonetic features that describe the trajectory of speech over the duration of a phonetic unit. A conditional probability model for phonetic and sub-phonetic features is considered. Experiments demonstrate that the inclusion of the segmental features result in about 10% decrease in error rates.

Keywords :

acoustic correlation; cepstral analysis; discrete cosine transforms; feature extraction; hidden Markov models; probability; signal representation; signal resolution; speech processing; speech recognition; 100 to 4000 Hz; DCT; HMM; cepstral features; conditional probability model; dynamic cepstral features; feature extraction; full voice-bandwidth; log energy-spectrum; modelling; multi-resolution spectral/time correlates; phonetic segments; spectral representation; speech recognition; speech segments; speech trajectory; sub-phonetic features; temporal structure; Cepstral analysis; Discrete cosine transforms; Energy resolution; Error analysis; Frequency estimation; Hidden Markov models; Pattern classification; Pattern recognition; Signal resolution; Speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location :

Munich

ISSN :

1520-6149

Print_ISBN :

0-8186-7919-0

Type :

conf

DOI :

10.1109/ICASSP.1997.596175

Filename :

596175

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=310616