Recognition rate estimation based on word alignment network and discriminative error type classification

Author

Ogawa, Anna ; Hori, Toshikazu ; Nakamura, A.

Author_Institution

NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan

fYear

2012

fDate

2-5 Dec. 2012

Firstpage

113

Lastpage

118

Abstract

Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.

Keywords

error statistics; probability; speech recognition; word processing; CSID probabilities; ETC; MIT lecture speech corpus; WACC; WAN; continuous speech recognition technology; conversion procedures; correct and word accuracy; correlation coefficient; deletion error; discriminative error type classification; easy-to-use method; insertion error; recognition rate estimation methods; recognition rate estimation technique; reference transcriptions; scoring tool; substitution error; word alignment network; word confusion network; word-by-word probabilities; Error probability; Estimation; Feature extraction; Speech; Speech recognition; Training; Wide area networks; Speech recognition; discriminative error type classification; recognition rate estimation; word alignment network;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language Technology Workshop (SLT), 2012 IEEE

Conference_Location

Miami, FL

Print_ISBN

978-1-4673-5125-6

Electronic_ISBN

978-1-4673-5124-9

Type

conf

DOI

10.1109/SLT.2012.6424207

Filename

6424207