DocumentCode
1908196
Title
Multi-source neural networks for speech recognition
Author
Gemello, Roberto ; Albesano, Dario ; Mana, Franco
Author_Institution
CSELT, Torino, Italy
Volume
5
fYear
1999
fDate
1999
Firstpage
2946
Abstract
In speech recognition the most diffused technology (hidden Markov models) is constrained by the condition of stochastic independence of its input features. That limits the simultaneous use of features derived from the speech signal with different processing algorithms. On the contrary artificial neural networks (ANN) are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this characteristic of ANNs to improve the speech recognition accuracy through the combined use of input features coming from different sources (different feature extraction algorithms). We integrate two input sources: the Mel based cepstral coefficients (MFCC) derived from FFT and the RASTA-PLP cepstral coefficients. The results show that this integration leads to an error reduction of 26% on a telephone quality test set
Keywords
feature extraction; hidden Markov models; multilayer perceptrons; neural net architecture; probability; speech recognition; state estimation; Mel based cepstral coefficients; RASTA-PLP cepstral coefficients; heterogeneous input features; multi-source neural networks; telephone quality test set; Artificial neural networks; Cepstral analysis; Feature extraction; Hidden Markov models; Mel frequency cepstral coefficient; Neural networks; Signal processing; Speech processing; Speech recognition; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 1999. IJCNN '99. International Joint Conference on
Conference_Location
Washington, DC
ISSN
1098-7576
Print_ISBN
0-7803-5529-6
Type
conf
DOI
10.1109/IJCNN.1999.835942
Filename
835942
Link To Document