DocumentCode
336728
Title
Large vocabulary speech recognition in French
Author
Adda-Decker, Martine ; Adda, Gilles ; Gauvain, Jean-Luc ; Lamel, Lori
Author_Institution
Lab. d´´Informatique pour la Mecanique et les Sci. de l´´Ingenieur, CNRS, Orsay, France
Volume
1
fYear
1999
fDate
15-19 Mar 1999
Firstpage
45
Abstract
We present some design considerations concerning our large vocabulary continuous speech recognition system in French. The impact of the epoch of the text training material on lexical coverage, language model perplexity and recognition performance on newspaper texts is demonstrated. The effectiveness of larger vocabulary sizes and larger text training corpora for language modeling is investigated. French is a highly inflected language producing large lexical variety and a high homophone rate. About 30% of recognition errors are shown to be due to substitutions between inflected forms of a given root form. When word error rates are analysed as a function of word frequency, a significant increase in the error rate can be measured for frequency ranks above 5000
Keywords
natural languages; speech recognition; French; continuous speech recognition system; frequency ranks; high homophone rate; inflected language; language model perplexity; language modeling; large vocabulary speech recognition; lexical coverage; newspaper texts; recognition errors; recognition performance; text training corpora; text training material; word error rates; word frequency; Acoustic testing; Error analysis; Frequency measurement; Natural languages; Speech analysis; Speech recognition; System testing; Text recognition; Training data; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
Conference_Location
Phoenix, AZ
ISSN
1520-6149
Print_ISBN
0-7803-5041-3
Type
conf
DOI
10.1109/ICASSP.1999.758058
Filename
758058
Link To Document