Sub-structure-based estimation of pronunciation proficiency and classification of learners

Author

Suzuki, Masayuki ; Minematsu, Nobuaki ; Luo, Dean ; Hirose, Keikichi

Author_Institution

Univ. of Tokyo, Tokyo, Japan

fYear

2009

fDate

Nov. 13 2009-Dec. 17 2009

Firstpage

574

Lastpage

579

Abstract

Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.

Keywords

classification; computer aided instruction; estimation theory; speaker recognition; speech processing; automatic estimation; input utterances; learner classification; linguistic factors; pronunciation proficiency estimation; speaker-invariant substructures; spectral envelope patterns; speech structure; structural analysis; vocal organs; Automatic control; Automatic speech recognition; Humans; Loudspeakers; Proposals; Robustness;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location

Merano

Print_ISBN

978-1-4244-5478-5

Electronic_ISBN

978-1-4244-5479-2

Type

conf

DOI

10.1109/ASRU.2009.5373275

Filename

5373275