• Title of article

    Unsupervised training of acoustic models for large vocabulary continuous speech recognition

  • Author/Authors

    H.، Ney, نويسنده , , F.، Wessel, نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2004
  • Pages
    -22
  • From page
    23
  • To page
    0
  • Abstract
    For large vocabulary continuous speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were thus recorded from various sources and had to be transcribed manually. It is thus desirable to train a recognizer with as little manually transcribed acoustic data as possible. Since untranscribed speech is available in various forms nowadays, the unsupervised training of a speech recognizer on recognized transcriptions is studied in this paper. A low-cost recognizer trained with between one and six h of manually transcribed speech is used to recognize 72 h of untranscribed acoustic data. These transcriptions are then used in combination with a confidence measure to train an improved recognizer. The effect of the confidence measure which is used to detect possible recognition errors is studied systematically. Finally, the unsupervised training is applied iteratively. Starting with only one h of transcribed acoustic data, a recognition system is trained fully automatically. With this iterative training procedure, the word error rates are reduced from 71.3% to 38.3% on the Broadcast Newsʹ96 evaluation test set and from 65.6% to 29.3% on the Broadcast Newsʹ98 evaluation test set. In comparison with an optimized system trained with the manually generated transcriptions of the complete 72 h training corpus, the word error rates increase by 14.3% relative and 18.6% relative, respectively.
  • Keywords
    Food patterns , waist circumference , Abdominal obesity , Prospective study
  • Journal title
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
  • Serial Year
    2004
  • Journal title
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
  • Record number

    86843