DocumentCode
323561
Title
Multilingual phone recognition of spontaneous telephone speech
Author
Corredor-Ardoy, C. ; Lamel, L. ; Adda-Decker, M. ; Gauvain, J.-L.
Author_Institution
Lab. d´´Inf. pour la Mecanique et les Sci. de l´´Ingenieur, CNRS, Orsay, France
Volume
1
fYear
1998
fDate
12-15 May 1998
Firstpage
413
Abstract
In this paper we report on experiments with phone recognition of spontaneous telephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing telephone speech in French, British English, German and Castillan Spanish. We investigated the influence of the training material composition (size and linguistic content) on the recognition performance using context-independent (CI) hidden Markov models (HMMs) and phonotactic bigram models. We found that when testing on spontaneous speech data, using only spontaneous speech training data gave the highest phone accuracies for the four languages, even though this data comprises only 14% of the available training data. The use of context-dependent (CD) HMMs reduced the phone error across the 4 languages, with the average error reduced to 51.9% from the 57.4% obtained with CI models. We suggest a straightforward way of detecting non speech phenomena. The basic idea is to remove sequences of consonants between two silence labels from the recognized phone strings prior to scoring. This simple technique reduces the relative average phone error rate by 5.4%. The lowest phone error with CD models and filtering was obtained for Spanish (39.1%) with 4 language average being 49.1%
Keywords
hidden Markov models; speech recognition; telephony; British English; Castillan Spanish; French; German; HMM; IDEAL multilingual corpus; context-dependent hidden Markov models; context-independent hidden Markov models; filtering; linguistic content; multilingual phone recognition; non-speech phenomena detection; phonotactic bigram models; relative average phone error rate; spontaneous telephone speech; training material composition; Composite materials; Context modeling; Error analysis; Filtering; Hidden Markov models; Natural languages; Speech recognition; Telephony; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location
Seattle, WA
ISSN
1520-6149
Print_ISBN
0-7803-4428-6
Type
conf
DOI
10.1109/ICASSP.1998.674455
Filename
674455
Link To Document