Title :
Automatic Accent Assessment Using Phonetic Mismatch and Human Perception
Author :
William, Freddy ; Sangwan, Abhijeet ; Hansen, John H. L.
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
Abstract :
In this study, a new algorithm for automatic accent evaluation of native and non-native speakers is presented. The proposed system consists of two main steps: alignment and scoring. In the alignment step, the speech utterance is processed using a Weighted Finite State Transducer (WFST) based technique to automatically estimate the pronunciation mismatches (substitutions, deletions, and insertions). Subsequently, in the scoring step, two scoring systems which utilize the pronunciation mismatches from the alignment phase are proposed: (i) a WFST-scoring system to measure the degree of accentedness on a scale from -1 (non-native like) to +1 (native like), and a (ii) Maximum Entropy (ME) based technique to assign perceptually motivated scores to pronunciation mismatches. The accent scores provided from the WFST-scoring system as well as the ME scoring system are termed as the WFST and P-WFST (perceptual WFST) accent scores, respectively. The proposed systems are evaluated on American English (AE) spoken by native and non-native (native speakers of Mandarin-Chinese) speakers from the CU-Accent corpus. A listener evaluation of 50 Native American English (N-AE) was employed to assist in validating the performance of the proposed accent assessment systems. The proposed P-WFST algorithm shows higher and more consistent correlation with human evaluated accent scores, when compared to the Goodness Of Pronunciation (GOP) measure. The proposed solution for accent classification and assessment based on WFST and P-WFST scores show that an effective advancement is possible which correlates well with human perception.
Keywords :
maximum entropy methods; speech processing; transducers; CU-accent corpus; GOP; ME; Mandarin-Chinese; N-AE; Native American English; P-WFST; accent classification; alignment step system; automatic accent assessment; goodness of pronunciation; human perception; maximum entropy technique; native speaker; nonnative speaker; perceptual weighted finite state transducer; phonetic mismatch; pronunciation mismatch estimation; scoring step system; speech utterance processing; Automatic accent assessment; finite state transducers (FST); maximum entropy models (MEMs); perception based measures; pronunciation scoring;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2013.2258011