Performance statistics of the HEAR acoustic processor

Author

Baker, Janet MacIver

Author_Institution

IBM T.J. Watson Research Center, Yorktown Heights, N.Y

Volume

4

fYear

1979

fDate

28946

Firstpage

262

Lastpage

265

Abstract

The HEAR acoustic processor combines standard frequency-domain and cycle-synchronous time-domain parameters. Output segments, usually 10 msec. in length, vary dynamically from .1 msec. to over 100 msec. to capture significant events in the underlying acoustic phone structure. Segment labels are determined by matching against a set of about 200 automatically selected prototypes. Some statistics on the fraction of segments correctly labeled (from a choice of 52 labels) and their most likely confusions are included. Speech recognition results obtained using the HEAR acoustic processor in conjunction with the training and decoding procedures of the IBM Research Continuous Speech Recognition mainline system are presented. On a set of 125 test sentences (1010 words) of the "New Raleigh Language" (artificial language, 250 word vocabulary, perplexity 7.27), the sentence recognition rate is 100%. On a set of 10 test sentences (282 words) of the "Laser-1000 Language" (natural language, 1000 word vocabulary, perplexity 21.1), the word recognition rate is 80%. Although it generally is difficult to ascribe errors to specific system components, three classes of errors are observed: 1) the correct word is not hypothesized; therefore acoustic match is not performed, 10.3% words, 2) the correct word is hypothesized but search is pruned prior to the construction of longer phrases including it, 6.4%, 3) the correct word is hypothesized, fully matched, and rejected in favor of an incorrect word, 3.2%. Errors of the third class are comprised exclusively of short function words (e.g. "the", "of", etc.), 2.2%, and deleted commas (realized acoustically by optional interword pauses), 1.0%.

Keywords

Acoustic signal processing; Acoustic testing; Acoustic waves; Decoding; Natural languages; Prototypes; Speech recognition; Statistics; Time domain analysis; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79.

Type

conf

DOI

10.1109/ICASSP.1979.1170636

Filename

1170636