• DocumentCode
    1550135
  • Title

    Speech recognition and utterance verification based on a generalized confidence score

  • Author

    Koo, Myoung-Wan ; Lee, Chin-Hui ; Juang, Biing-hwang

  • Author_Institution
    Spoken Language Res. Team, Korea Telecom, Seoul, South Korea
  • Volume
    9
  • Issue
    8
  • fYear
    2001
  • fDate
    11/1/2001 12:00:00 AM
  • Firstpage
    821
  • Lastpage
    832
  • Abstract
    In this paper, we introduce a generalized confidence score (GCS) function that enables a framework to integrate different confidence scores in speech recognition and utterance verification. A modified decoder based on the GCS is then proposed. The GCS is defined as a combination of various confidence scores obtained by exponential weighting from various confidence information sources, such as likelihood, likelihood ratio, duration, language model probabilities, etc. We also propose the use of a confidence preprocessor to transform raw scores into manageable terms for easy integration. We consider two kinds of hybrid decoders, an ordinary hybrid decoder and an extended hybrid decoder, as implementation examples based on the generalized confidence score. The ordinary hybrid decoder uses a frame-level likelihood ratio in addition to a frame-level likelihood, while a conventional decoder uses only the frame likelihood or likelihood ratio. The extended hybrid decoder uses not only the frame-level likelihood but also multilevel information such as frame-level, phone-level, and word-level confidence scores based on the likelihood ratios. Our experimental evaluation shows that the proposed hybrid decoders give better results than those obtained by the conventional decoders, especially in dealing with ill-formed utterances that contain out-of-vocabulary words and phrases
  • Keywords
    maximum likelihood decoding; speech recognition; GCS function; confidence preprocessor; exponential weighting; extended hybrid decoder; frame-level confidence scores; frame-level likelihood ratio; generalized confidence score; ill-formed utterances; multilevel information; ordinary hybrid decoder; out-of-vocabulary phrases; out-of-vocabulary words; phone-level confidence scores; speech recognition; utterance verification; word-level confidence scores; Algorithm design and analysis; Dynamic programming; Hidden Markov models; Laboratories; Maximum likelihood decoding; Maximum likelihood estimation; Multimedia communication; Mutual information; Natural languages; Speech recognition;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.966085
  • Filename
    966085