A time-frequency segmental neural network for phoneme recognition

Author

Basu, Anjan ; Svendsen, Torbjorn

Author_Institution

Dept. of Telecommun., Norwegian Inst. of Technol., Trondheim, Norway

Volume

1

fYear

1993

fDate

27-30 April 1993

Firstpage

509

Abstract

The authors propose a time-frequency segmental neural network (TFSNN) which classifies phonemes according to the two-dimensional time frequency distribution of the whole phonetic segment. It uses a network architecture similar to those used for optical character recognition (OCR) to provide local shift invariance along both the time and the frequency axis. The TFSNN can be used in place of a segmental neural network (SNN) in a hybrid hidden Markov model (HMM) artificial neural network (ANN) system for automatic speech recognition as it shows significantly better performance than the SNN. The training times for the TFSNN is also smaller as it employs very few connection weights compared with the SNN.<>

Keywords

hidden Markov models; learning (artificial intelligence); neural nets; speech recognition; time-frequency analysis; automatic speech recognition; connection weights; hidden Markov model; local shift invariance; network architecture; performance; phoneme recognition; time-frequency segmental neural network; training times;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on

Conference_Location

Minneapolis, MN, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.1993.319167

Filename

319167