Title :
Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech
Author :
Zhou Yu;Vikram Ramanarayanan;David Suendermann-Oeft;Xinhao Wang;Klaus Zechner;Lei Chen;Jidong Tao;Aliaksei Ivanou;Yao Qian
Author_Institution :
Educational Testing Service R&D, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA
Abstract :
We introduce a new method to grade non-native spoken language tests automatically. Traditional automated response grading approaches use manually engineered time-aggregated features (such as mean length of pauses). We propose to incorporate general time-sequence features (such as pitch) which preserve more information than time-aggregated features and do not require human effort to design. We use a type of recurrent neural network to jointly optimize the learning of high level abstractions from time-sequence features with the time-aggregated features. We first automatically learn high level abstractions from time-sequence features with a Bidirectional Long Short Term Memory (BLSTM) and then combine the high level abstractions with time-aggregated features in a Multilayer Perceptron (MLP)/Linear Regression (LR). We optimize the BLSTM and the MLP/LR jointly. We find such models reach the best performance in terms of correlation with human raters. We also find that when there are limited time-aggregated features available, our model that incorporates time-sequence features improves performance drastically.
Keywords :
"Speech","Feature extraction","Recurrent neural networks","Logic gates","Speech recognition","Jitter","Context"
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
DOI :
10.1109/ASRU.2015.7404814