• DocumentCode
    323561
  • Title

    Multilingual phone recognition of spontaneous telephone speech

  • Author

    Corredor-Ardoy, C. ; Lamel, L. ; Adda-Decker, M. ; Gauvain, J.-L.

  • Author_Institution
    Lab. d´´Inf. pour la Mecanique et les Sci. de l´´Ingenieur, CNRS, Orsay, France
  • Volume
    1
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    413
  • Abstract
    In this paper we report on experiments with phone recognition of spontaneous telephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing telephone speech in French, British English, German and Castillan Spanish. We investigated the influence of the training material composition (size and linguistic content) on the recognition performance using context-independent (CI) hidden Markov models (HMMs) and phonotactic bigram models. We found that when testing on spontaneous speech data, using only spontaneous speech training data gave the highest phone accuracies for the four languages, even though this data comprises only 14% of the available training data. The use of context-dependent (CD) HMMs reduced the phone error across the 4 languages, with the average error reduced to 51.9% from the 57.4% obtained with CI models. We suggest a straightforward way of detecting non speech phenomena. The basic idea is to remove sequences of consonants between two silence labels from the recognized phone strings prior to scoring. This simple technique reduces the relative average phone error rate by 5.4%. The lowest phone error with CD models and filtering was obtained for Spanish (39.1%) with 4 language average being 49.1%
  • Keywords
    hidden Markov models; speech recognition; telephony; British English; Castillan Spanish; French; German; HMM; IDEAL multilingual corpus; context-dependent hidden Markov models; context-independent hidden Markov models; filtering; linguistic content; multilingual phone recognition; non-speech phenomena detection; phonotactic bigram models; relative average phone error rate; spontaneous telephone speech; training material composition; Composite materials; Context modeling; Error analysis; Filtering; Hidden Markov models; Natural languages; Speech recognition; Telephony; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.674455
  • Filename
    674455