DocumentCode :
2795983
Title :
Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus
Author :
Suchato, Atiwong ; Chanjaradwichai, Supadaech ; Kertkeidkachorn, Natthawut ; Vorapatratorn, Surapol ; Hirankan, Pawanrat ; Suri, Teera ; Likitsupin, Krerksak ; Chuetanapinyo, Supakit ; Punyabukkana, Proadpran
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
fYear :
2012
fDate :
16-18 May 2012
Firstpage :
1
Lastpage :
4
Abstract :
Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.
Keywords :
loudspeakers; microphones; speech recognition; acoustic characteristics; acoustic mismatch effect; acoustic models; clean-speech corpus; limited-vocabulary word recognitions; loudspeakers; microphones; minimal lexical constraints; playback-recorded speech corpus; playback-trained system; speech data; speech recognition techniques; test data; Accuracy; Acoustics; Hidden Markov models; Speech; Speech processing; Speech recognition; Training; Loudspeaker; Speech corpus; Speech recognition; Speech recognition accuracy benchmark;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2012 9th International Conference on
Conference_Location :
Phetchaburi
Print_ISBN :
978-1-4673-2026-9
Type :
conf
DOI :
10.1109/ECTICon.2012.6254211
Filename :
6254211
Link To Document :
بازگشت