Title :
Improved and robust prediction of pronunciation distance for individual-basis clustering of World Englishes pronunciation
Author :
Kasahara, Shun ; Kitahara, Saori ; Minematsu, Nobuaki ; Shen, H.-P. ; Makino, Tatsuya ; Saito, Daisuke ; Hiorse, K.
Author_Institution :
Univ. of Tokyo, Tokyo, Japan
Abstract :
English is the only language available for global communication and is used by approximately 1.5 billions of speakers. It is also known to have a large diversity of pronunciation due to the influence of speakers´ mother tongue, called accents. Our project aims at creating a global and individual-basis map of English pronunciations to be used in teaching and learning World Englishes (WE) as well as research studies of WE [1, 2]. Creating the map mathematically requires a distance matrix in terms of pronunciation differences among all the speakers considered, and technically requires a method of predicting the pronunciation distance between any pair of the speakers only by using their speech samples. In our previous study [3], we combined invariant pronunciation structure analysis [4, 5, 6, 7] and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. In this paper, several techniques are introduced and examined whether they can increase accuracy and robustness of prediction. Experiments show that the correlation between IPA-based reference distances and the predicted distances is increased from 0.805 to 0.903, which is over the correlation of 0.829 that is obtained by using the phoneme-based ground truth distances.
Keywords :
natural language processing; pattern clustering; speech processing; English pronunciations; IPA-based reference distances; World Englishes; distance matrix; individual-basis clustering; phoneme-based ground truth distances; pronunciation distance prediction; speech samples; Acoustics; Correlation; Educational institutions; Hidden Markov models; Robustness; Speech; Training; IPA transcription; SAA; World Englishes; f-divergence; phoneme recognition; pronunciation clustering; pronunciation structure analysis; support vector regression;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854194