Title :
Automatic Pronunciation Scoring with Score Combination by Learning to Rank and Class-Normalized DP-Based Quantization
Author :
Liang-Yu Chen ; Jang, Jyh-Shing Roger
Author_Institution :
Inst. of Inf. Syst. & Applic., Nat. Tsing Hua Univ., Hsinchu, Taiwan
Abstract :
This paper proposes an automatic pronunciation scoring framework using learning to rank and class-normalized, dynamic-programming-based quantization. The goal is to train a model that is able to grade the pronunciation of a second language learner, such that the predicted score is as close as possible to the one given by a human teacher. Under this framework, each utterance is given a score of 1 to 5 by human raters, which is treated as a ground truth rank for the training algorithm. The corpus was rated by qualified English teachers in Taiwan (nonnative speakers). Nine phone-level scores are computed and converted into word-level scores through four conversion methods. We select the 16 best performing scores as the input features to train the learning-to-rank function. The output of the function is then quantized to a discrete rank on a 1-5 scale. The quantization is done with class normalization to alleviate the problem of data imbalance over different classes. Experimental results show that the proposed framework achieves a higher correlation to the human scores than other methods, along with higher accuracy in detecting instances of mispronunciation. We also release a new version of our nonnative corpus with human rankings.
Keywords :
computer based training; dynamic programming; learning (artificial intelligence); natural language processing; Taiwan; automatic pronunciation scoring; class-normalized DP-based quantization; dynamic-programming-based quantization; ground truth rank; learning; phone-level score; score combination; training algorithm; word-level score; Correlation; Hidden Markov models; IEEE transactions; Quantization (signal); Speech; Speech processing; Training; Automatic pronunciation scoring; computer assisted language learning (CALL); computer assisted pronunciation training (CAPT); learning to rank;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2015.2449089