مرکز منطقه ای اطلاع رساني علوم و فناوري - Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus

DocumentCode :

2795983

Title :

Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus

Author :

Suchato, Atiwong ; Chanjaradwichai, Supadaech ; Kertkeidkachorn, Natthawut ; Vorapatratorn, Surapol ; Hirankan, Pawanrat ; Suri, Teera ; Likitsupin, Krerksak ; Chuetanapinyo, Supakit ; Punyabukkana, Proadpran

Author_Institution :

Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand

fYear :

2012

fDate :

16-18 May 2012

Firstpage :

Lastpage :

Abstract :

Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.

Keywords :

loudspeakers; microphones; speech recognition; acoustic characteristics; acoustic mismatch effect; acoustic models; clean-speech corpus; limited-vocabulary word recognitions; loudspeakers; microphones; minimal lexical constraints; playback-recorded speech corpus; playback-trained system; speech data; speech recognition techniques; test data; Accuracy; Acoustics; Hidden Markov models; Speech; Speech processing; Speech recognition; Training; Loudspeaker; Speech corpus; Speech recognition; Speech recognition accuracy benchmark;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2012 9th International Conference on

Conference_Location :

Phetchaburi

Print_ISBN :

978-1-4673-2026-9

Type :

conf

DOI :

10.1109/ECTICon.2012.6254211

Filename :

6254211

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2795983