• DocumentCode
    2971262
  • Title

    Sub-structure-based estimation of pronunciation proficiency and classification of learners

  • Author

    Suzuki, Masayuki ; Minematsu, Nobuaki ; Luo, Dean ; Hirose, Keikichi

  • Author_Institution
    Univ. of Tokyo, Tokyo, Japan
  • fYear
    2009
  • fDate
    Nov. 13 2009-Dec. 17 2009
  • Firstpage
    574
  • Lastpage
    579
  • Abstract
    Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.
  • Keywords
    classification; computer aided instruction; estimation theory; speaker recognition; speech processing; automatic estimation; input utterances; learner classification; linguistic factors; pronunciation proficiency estimation; speaker-invariant substructures; spectral envelope patterns; speech structure; structural analysis; vocal organs; Automatic control; Automatic speech recognition; Humans; Loudspeakers; Proposals; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
  • Conference_Location
    Merano
  • Print_ISBN
    978-1-4244-5478-5
  • Electronic_ISBN
    978-1-4244-5479-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2009.5373275
  • Filename
    5373275