DocumentCode
2971262
Title
Sub-structure-based estimation of pronunciation proficiency and classification of learners
Author
Suzuki, Masayuki ; Minematsu, Nobuaki ; Luo, Dean ; Hirose, Keikichi
Author_Institution
Univ. of Tokyo, Tokyo, Japan
fYear
2009
fDate
Nov. 13 2009-Dec. 17 2009
Firstpage
574
Lastpage
579
Abstract
Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.
Keywords
classification; computer aided instruction; estimation theory; speaker recognition; speech processing; automatic estimation; input utterances; learner classification; linguistic factors; pronunciation proficiency estimation; speaker-invariant substructures; spectral envelope patterns; speech structure; structural analysis; vocal organs; Automatic control; Automatic speech recognition; Humans; Loudspeakers; Proposals; Robustness;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location
Merano
Print_ISBN
978-1-4244-5478-5
Electronic_ISBN
978-1-4244-5479-2
Type
conf
DOI
10.1109/ASRU.2009.5373275
Filename
5373275
Link To Document