Detection of ambiguous portions of signal corresponding to OOV words or misrecognized portions of input

Author

Lacouture, Roxane ; Normandin, Yves

Author_Institution

CRIM, Montreal, Que., Canada

Volume

4

fYear

1996

fDate

3-6 Oct 1996

Firstpage

2071

Abstract

One of the key problems for large vocabulary ASR is the detection of unknown or misrecognized portions of the input. The paper presents results obtained using a local rejection algorithm. The algorithm is derived from the two pass recognition algorithm by H. Murveit et al. (1993) and is used to detect misrecognized portions based on the number per frame of active words during the second pass. The hypothesis underlying the algorithm is that recognition on unexpected data, i.e. noise or out of vocabulary (OOV) words, is likely to result in activation of more words, since no word matches the data well; on the other hand, when the match is good, fewer words should be active. The algorithm was tried on part of the WSJ 5K November 1993 test, in which there were no OOV words (3370 words in total) and on the digit strings only Macrophone data (14686 words of which 895 were OOV). The results obtained indicate that our approach is promising, both for the detection of OOV words and misrecognized portions of the input. It may provide the base on which to build tools for dealing with these phenomena. These tools might include dialogue mechanisms based on the list of activated words corresponding to a rejected portion, display mechanisms such as reverse video or rescoring schemes

Keywords

speech processing; speech recognition; word processing; Macrophone data; OOV words; activated words; active words; ambiguous portions; automatic speech recognition; dialogue mechanisms; digit strings; large vocabulary ASR; local rejection algorithm; misrecognized portions; out of vocabulary words; rescoring schemes; reverse video; signal detection; two pass recognition algorithm; unexpected data; Acoustic beams; Acoustic signal detection; Active noise reduction; Automatic speech recognition; Displays; Educational institutions; Lattices; Testing; Viterbi algorithm; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607209

Filename

607209