Title :
Two-stage feature compensation of clean and telephone speech signals employing bidirectional neural network
Author :
Esmaili, Iman ; Vali, Mansour ; Kabudian, Jahanshah
Author_Institution :
Shahed Univ., Tehran, Iran
Abstract :
In this paper, we continue our previous work on nonlinear feature compensation of distortions in clean and telephone speech recognition systems. We have shown that Bidirectional Neural Network (Bidi-NN) can compensate nonlinearly-distorted components of feature vectors. In this study, we present a new effort to improve recognition accuracy on clean and telephone speech data by employing a two-stage feature compensation technique for recovering optimal (from a classification point of view) Log-Filter Bank Energies (LFBE). These new features are achieved by training a new Bidi-NN with compensated features and considering compensated feature as the input data to Bidi-NN. We also achieved MFCC features by applying discrete cosine transform (DCT) to compensated Log-Filter Bank Energies (LFBE) features. HMM phone models are trained on these modified features. By using the two-stage compensated features, we obtained an absolute improvement of 4.73% and 9.29% in phone recognition accuracy compared to baseline system in clean and telephone conditions respectively. We also obtained an absolute improvement of 25.67% in phone recognition accuracy for the system which was trained on clean data but tested on telephone data. These results show excellency of NN-based nonlinear compensation of speech feature vectors in HMM-based speech recognition systems.
Keywords :
acoustic distortion; discrete cosine transforms; hidden Markov models; neural nets; speech recognition; HMM phone model; baseline system; bidirectional neural network; clean speech signal; discrete cosine transform; feature compensation technique; feature vector; hidden Markov model; log filter bank energies; mel-frequency cepstrum; phone recognition accuracy; telephone speech recognition system; Adaptation model; Artificial neural networks; Hidden Markov models; Lead; Mel frequency cepstral coefficient; Bidirectional neural network (Bidi-NN); hidden markov model; robust speech recognition; telephone speech recognition;
Conference_Titel :
Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-7165-2
DOI :
10.1109/ISSPA.2010.5605494