Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks

Author

Schuster, Mike

Author_Institution

ATR Interpreting Telephony Res. Labs., Kyoto, Japan

Volume

4

fYear

1997

fDate

21-24 Apr 1997

Firstpage

3249

Abstract

A new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior “frame to phoneme” probabilities, they are used here to estimate directly “segment to phoneme” probabilities, which results in an improved duration model. The special MAP approach allows not only incorporation of long term dependencies on the acoustic side, but also on the phone (output) side, which results automatically in parameter efficient context dependent models. While the use of neural networks as frame or phoneme classifiers always results in discriminative training for the acoustic information, the MAP approach presented also incorporates discriminative training for the internally learned phoneme language model. Classification tests for the TIMIT phoneme database gave promising results of 77.75 (82.38)% for the full test data set with all 61(39) symbols

Keywords

acoustic signal processing; feature extraction; learning (artificial intelligence); maximum likelihood estimation; pattern classification; recurrent neural nets; speech processing; speech recognition; TIMIT phoneme database; acoustic model building; bidirectional recurrent neural networks; classification tests; discriminative training; duration model; feature extraction; frame classifiers; long term dependencies; nonuniform segments; parameter efficient context dependent models; phoneme classifiers; phoneme language model; segment to phoneme probabilities; speech recognition; speech recognition systems; test data set; Acoustic testing; Databases; Error analysis; Merging; Neural networks; Pattern recognition; Probability; Recurrent neural networks; Speech recognition; Statistical analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on

Conference_Location

Munich

ISSN

1520-6149

Print_ISBN

0-8186-7919-0

Type

conf

DOI

10.1109/ICASSP.1997.595486

Filename

595486