• DocumentCode
    3131937
  • Title

    Recognition rate estimation based on word alignment network and discriminative error type classification

  • Author

    Ogawa, Anna ; Hori, Toshikazu ; Nakamura, A.

  • Author_Institution
    NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    113
  • Lastpage
    118
  • Abstract
    Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.
  • Keywords
    error statistics; probability; speech recognition; word processing; CSID probabilities; ETC; MIT lecture speech corpus; WACC; WAN; continuous speech recognition technology; conversion procedures; correct and word accuracy; correlation coefficient; deletion error; discriminative error type classification; easy-to-use method; insertion error; recognition rate estimation methods; recognition rate estimation technique; reference transcriptions; scoring tool; substitution error; word alignment network; word confusion network; word-by-word probabilities; Error probability; Estimation; Feature extraction; Speech; Speech recognition; Training; Wide area networks; Speech recognition; discriminative error type classification; recognition rate estimation; word alignment network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424207
  • Filename
    6424207