DocumentCode :
310616
Title :
Multi-resolution phonetic/segmental features and models for HMM-based speech recognition
Author :
Vaseghi, Saeed ; Harte, Naomi ; Milner, B.
Author_Institution :
Queen´s Univ., Belfast, UK
Volume :
2
fYear :
1997
fDate :
21-24 April 1997
Firstpage :
1263
Abstract :
This paper explores the modelling of phonetic segments of speech with multi-resolution spectral/time correlates. For spectral representation a set of multi-resolution cepstral features are proposed. Cepstral features obtained from a DCT of the log energy-spectrum over the full voice-bandwidth (100-4000 Hz) are combined with higher resolution features obtained from the DCT of upper subband (say 100-2100) and lower subband (2100-4000) halves. This approach can be extended to several levels of different resolutions. For representation of the temporal structure of speech segments or phonetic units, the conventional cepstral and dynamic cepstral features representing speech at the sub-phonetic levels, are supplemented by a set of phonetic features that describe the trajectory of speech over the duration of a phonetic unit. A conditional probability model for phonetic and sub-phonetic features is considered. Experiments demonstrate that the inclusion of the segmental features result in about 10% decrease in error rates.
Keywords :
acoustic correlation; cepstral analysis; discrete cosine transforms; feature extraction; hidden Markov models; probability; signal representation; signal resolution; speech processing; speech recognition; 100 to 4000 Hz; DCT; HMM; cepstral features; conditional probability model; dynamic cepstral features; feature extraction; full voice-bandwidth; log energy-spectrum; modelling; multi-resolution spectral/time correlates; phonetic segments; spectral representation; speech recognition; speech segments; speech trajectory; sub-phonetic features; temporal structure; Cepstral analysis; Discrete cosine transforms; Energy resolution; Error analysis; Frequency estimation; Hidden Markov models; Pattern classification; Pattern recognition; Signal resolution; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location :
Munich
ISSN :
1520-6149
Print_ISBN :
0-8186-7919-0
Type :
conf
DOI :
10.1109/ICASSP.1997.596175
Filename :
596175
Link To Document :
بازگشت