Title :
Pronunciation quality evaluation approach based on bimodal fusion with noise adaptive weight
Author :
Xibin Jia ; Kewei Zhang ; Yanfang Han ; Powers, David
Author_Institution :
Beijing Municipal Key Lab., Beijing Univ. of Technol., Beijing, China
Abstract :
Facing the requirement of the virtual pedagogy application to have the ability of evaluating English learners´ pronunciation quality, the paper proposes an automatic assessment method based on a bimodal fusion decision algorithm. The pronunciation level is scored by comparing the similarity between learner and standard´s audio and video speech signals separately. The final score of the learner´s pronunciation is gotten by fusing the above scores with the linear weighting combination approach. Referring to the knowledge that the visual speech can aid the audio to improve the human perception especially under noisy environments, the paper proposes a noise adaptive weighting strategy in fusing process. To solve the problem of disagreement of speech length due to the various speaking speed, the paper adopts the dynamic warping algorithm to do the time alignment between the test speeches and the standard ones. The data selected from the Australia audio and visual speech corpus (AVOZES) is employed to test the performance of our automatic evaluating system. The experiment result shows that audio and visual speech fusion approach improves the rationality of automatic pronunciation accessing system by making full use of correlative and complementary information between acoustic and visual speech comparing to the audio-speech-only evaluation results.
Keywords :
audio signal processing; computer aided instruction; sensor fusion; speech processing; video signal processing; virtual reality; AVOZES; Australia audio and visual speech corpus; English learner; acoustic speech; audio speech signal; automatic assessment method; bimodal fusion decision algorithm; human perception; linear weighting combination approach; noise adaptive weight; pronunciation quality evaluation approach; video speech signal; virtual pedagogy application; visual speech; bimodal fusion; pronunciation evaluation; timing alignment; visual speech;
Conference_Titel :
Computing and Convergence Technology (ICCCT), 2012 7th International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4673-0894-6