Title :
Recent improvements to the Cambridge Arabic Speech-to-Text systems
Author :
Tomalin, M. ; Diehl, F. ; Gales, M.J.F. ; Park, J. ; Woodland, P.C.
Author_Institution :
Eng. Dept., Cambridge Univ., Cambridge, UK
Abstract :
This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVSCR) Speech-to-Text (STT) system. It is shown that Multi-Layer Perceptron (MLP) features trained on phonetic targets can improve the performance of both phonemic and graphemic systems. Also, a morphological decomposition scheme is extended from the graphemic domain to the phonetic domain, and particular attention is given to the task of dictionary generation. Finally, the use of Boosted Maximum Mutual Information (BMMI) training is explored both for individual systems and in the context of system combination. The full system results show that the combined use of the above techniques reduces the Word Error Rate (WER) of the best individual system by up to 12% relative, and that the incorporation of morphological decomposition and BMMI within the four individual branches of the combined system reduces the WER by up to 9% relative.
Keywords :
learning (artificial intelligence); multilayer perceptrons; speech recognition; speech synthesis; vocabulary; Cambridge Arabic speech-to-text systems; boosted maximum mutual information training; graphemic systems; large vocabulary continuous speech recognition; morphological decomposition scheme; multilayer perceptron; phonemic systems; word error rate; Context modeling; Dictionaries; Error analysis; Multilayer perceptrons; Mutual information; Performance gain; Speech recognition; Subcontracting; US Government; Vocabulary; Arabic; Boosted MMI; MLP features; Morphological Decomposition; Speech-to-Text;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495641