Robust out-of-vocabulary rejection for low-complexity speaker independent speech recognition

Author

Broun, C.C. ; Campbell, W.M.

Author_Institution

Human Interface Lab., Motorola Inc., Tempe, AZ, USA

Volume

3

fYear

2000

fDate

2000

Firstpage

1811

Abstract

With the increased use of speech recognition outside of the lab environment, the need for better out-of-vocabulary (OOV) rejection techniques is critical for the continued success of this user interface. Not only must future speech recognition systems accurately reject OOV utterances, but they must also maintain their performance in mismatched (i.e. noisy) conditions. In this paper, we extend our work on low-complexity, high-accuracy speaker independent speech recognition. We present a novel rejection criterion that is shown to be robust in mismatched conditions. This technique continues our emphasis on speech recognition for resource limited applications, by providing a solution that is highly scalable, requiring no additional memory and no significant increase in computation. The technique is based on the use of multiple garbage models (on the order of 100 or more) and a novel ranking method to achieve robust performance. This method allows for a data dependent approach in order to optimize the performance over each class individually. Results are presented for a large database consisting of 166 speakers and 131 classes. Out-of-class rejection is based on 118 out-of-vocabulary phrases and 3 categories of spurious inputs (breath noise, coughs, and lipsmack). Performance is shown to be superior to the approximated optimal Bayes reject rule

Keywords

computational complexity; pattern classification; speech recognition; OOV utterances rejection; breath noise; coughs; data dependent approach; lipsmack; low-complexity; mismatched conditions; multiple garbage models; noisy conditions; out-of-class rejection; out-of-vocabulary rejection; ranking method; robust performance; speaker independent speech recognition; spurious inputs; user interface; Cost function; Humans; Noise robustness; Optimization methods; Pattern recognition; Polynomials; Speech recognition; Statistical distributions; User interfaces; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location

Istanbul

ISSN

1520-6149

Print_ISBN

0-7803-6293-4

Type

conf

DOI

10.1109/ICASSP.2000.862106

Filename

862106