DocumentCode :
2768520
Title :
Development of a phonetic system for large vocabulary Arabic speech recognition
Author :
Gales, M.J.F. ; Diehl, F. ; Raut, C.K. ; Tomalin, M. ; Woodland, P.C. ; Yu, K.
Author_Institution :
Cambridge Univ., Cambridge
fYear :
2007
fDate :
9-13 Dec. 2007
Firstpage :
24
Lastpage :
29
Abstract :
This paper describes the development of an Arabic speech recognition system based on a phonetic dictionary. Though phonetic systems have been previously investigated, this paper makes a number of contributions to the understanding of how to build these systems, as well as describing a complete Arabic speech recognition system. The first issue considered is discriminative training when there are a large number of pronunciation variants for each word. In particular, the loss function associated with minimum phone error (MPE) training is examined. The performance and combination of phonetic and graphemic acoustic models are then compared on both Broadcast News (BN) and Broadcast Conversation (BC) data. The final contribution of the paper is a simple scheme for automatically generating pronunciations for use in training and reducing the phonetic out-of-vocabulary rate. The paper concludes with a description and results from using phonetic and graphemic systems in a multipass/combination framework.
Keywords :
natural languages; speech processing; speech recognition; graphemic system; large vocabulary Arabic speech recognition; minimum phone error training; phonetic system; pronunciation variants; Books; Broadcasting; Contracts; Dictionaries; Joining processes; Natural languages; Speech recognition; Training data; Vocabulary; Arabic; Large vocabulary speech recognition; discriminative training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
Type :
conf
DOI :
10.1109/ASRU.2007.4430078
Filename :
4430078
Link To Document :
بازگشت