DocumentCode
3131937
Title
Recognition rate estimation based on word alignment network and discriminative error type classification
Author
Ogawa, Anna ; Hori, Toshikazu ; Nakamura, A.
Author_Institution
NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
fYear
2012
fDate
2-5 Dec. 2012
Firstpage
113
Lastpage
118
Abstract
Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.
Keywords
error statistics; probability; speech recognition; word processing; CSID probabilities; ETC; MIT lecture speech corpus; WACC; WAN; continuous speech recognition technology; conversion procedures; correct and word accuracy; correlation coefficient; deletion error; discriminative error type classification; easy-to-use method; insertion error; recognition rate estimation methods; recognition rate estimation technique; reference transcriptions; scoring tool; substitution error; word alignment network; word confusion network; word-by-word probabilities; Error probability; Estimation; Feature extraction; Speech; Speech recognition; Training; Wide area networks; Speech recognition; discriminative error type classification; recognition rate estimation; word alignment network;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location
Miami, FL
Print_ISBN
978-1-4673-5125-6
Electronic_ISBN
978-1-4673-5124-9
Type
conf
DOI
10.1109/SLT.2012.6424207
Filename
6424207
Link To Document