Multilingual phone recognition of spontaneous telephone speech

Author

Corredor-Ardoy, C. ; Lamel, L. ; Adda-Decker, M. ; Gauvain, J.-L.

Author_Institution

Lab. d´´Inf. pour la Mecanique et les Sci. de l´´Ingenieur, CNRS, Orsay, France

Volume

1

fYear

1998

fDate

12-15 May 1998

Firstpage

413

Abstract

In this paper we report on experiments with phone recognition of spontaneous telephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing telephone speech in French, British English, German and Castillan Spanish. We investigated the influence of the training material composition (size and linguistic content) on the recognition performance using context-independent (CI) hidden Markov models (HMMs) and phonotactic bigram models. We found that when testing on spontaneous speech data, using only spontaneous speech training data gave the highest phone accuracies for the four languages, even though this data comprises only 14% of the available training data. The use of context-dependent (CD) HMMs reduced the phone error across the 4 languages, with the average error reduced to 51.9% from the 57.4% obtained with CI models. We suggest a straightforward way of detecting non speech phenomena. The basic idea is to remove sequences of consonants between two silence labels from the recognized phone strings prior to scoring. This simple technique reduces the relative average phone error rate by 5.4%. The lowest phone error with CD models and filtering was obtained for Spanish (39.1%) with 4 language average being 49.1%

Keywords

hidden Markov models; speech recognition; telephony; British English; Castillan Spanish; French; German; HMM; IDEAL multilingual corpus; context-dependent hidden Markov models; context-independent hidden Markov models; filtering; linguistic content; multilingual phone recognition; non-speech phenomena detection; phonotactic bigram models; relative average phone error rate; spontaneous telephone speech; training material composition; Composite materials; Context modeling; Error analysis; Filtering; Hidden Markov models; Natural languages; Speech recognition; Telephony; Testing; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on

Conference_Location

Seattle, WA

ISSN

1520-6149

Print_ISBN

0-7803-4428-6

Type

conf

DOI

10.1109/ICASSP.1998.674455

Filename

674455