مرکز منطقه ای اطلاع رساني علوم و فناوري - Why word error rate is not a good metric for speech recognizer training for the speech translation task?

DocumentCode :

2181004

Title :

Why word error rate is not a good metric for speech recognizer training for the speech translation task?

Author :

He, Xiaodong ; Deng, Li ; Acero, Alex

Author_Institution :

Microsoft Res., Redmond, WA, USA

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

5632

Lastpage :

5635

Abstract :

Speech translation (ST) is an enabling technology for cross-lingual oral communication. A ST system consists of two major components: an automatic speech recognizer (ASR) and a machine translator (MT). Nowadays, most ASR systems are trained and tuned by minimizing word error rate (WER). However, WER counts word errors at the surface level. It does not consider the contextual and syntactic roles of a word, which are often critical for MT. In the end-to-end ST scenarios, whether WER is a good metric for the ASR component of the full ST system is an open issue and lacks systematic studies. In this paper, we report our recent investigation on this issue, focusing on the interactions of ASR and MT in a ST system. We show that BLEU-oriented global optimization of ASR system parameters improves the translation quality by an absolute 1.5% BLEU score, while sacrificing WER over the conventional, WER-optimized ASR system. We also conducted an in-depth study on the impact of ASR errors on the final ST output. Our findings suggest that the speech recognizer component of the full ST system should be optimized by translation metrics instead of the traditional WER.

Keywords :

language translation; optimisation; speech recognition; training; ASR component; BLEU oriented global optimization; ST system; WER optimized ASR system; crosslingual oral communication; end-to-end ST scenario; machine translator; speech recognizer training; speech translation; word error rate; Computational modeling; Hidden Markov models; Measurement; Optimization; Speech; Speech recognition; Training; BLEU score optimization; Speech translation; log-linear model; machine translation; speech recognition; translation metric; word error rate;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947637

Filename :

5947637

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2181004