DocumentCode
310460
Title
Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks
Author
Schuster, Mike
Author_Institution
ATR Interpreting Telephony Res. Labs., Kyoto, Japan
Volume
4
fYear
1997
fDate
21-24 Apr 1997
Firstpage
3249
Abstract
A new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior “frame to phoneme” probabilities, they are used here to estimate directly “segment to phoneme” probabilities, which results in an improved duration model. The special MAP approach allows not only incorporation of long term dependencies on the acoustic side, but also on the phone (output) side, which results automatically in parameter efficient context dependent models. While the use of neural networks as frame or phoneme classifiers always results in discriminative training for the acoustic information, the MAP approach presented also incorporates discriminative training for the internally learned phoneme language model. Classification tests for the TIMIT phoneme database gave promising results of 77.75 (82.38)% for the full test data set with all 61(39) symbols
Keywords
acoustic signal processing; feature extraction; learning (artificial intelligence); maximum likelihood estimation; pattern classification; recurrent neural nets; speech processing; speech recognition; TIMIT phoneme database; acoustic model building; bidirectional recurrent neural networks; classification tests; discriminative training; duration model; feature extraction; frame classifiers; long term dependencies; nonuniform segments; parameter efficient context dependent models; phoneme classifiers; phoneme language model; segment to phoneme probabilities; speech recognition; speech recognition systems; test data set; Acoustic testing; Databases; Error analysis; Merging; Neural networks; Pattern recognition; Probability; Recurrent neural networks; Speech recognition; Statistical analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
Conference_Location
Munich
ISSN
1520-6149
Print_ISBN
0-8186-7919-0
Type
conf
DOI
10.1109/ICASSP.1997.595486
Filename
595486
Link To Document