DocumentCode
3034984
Title
Performance statistics of the HEAR acoustic processor
Author
Baker, Janet MacIver
Author_Institution
IBM T.J. Watson Research Center, Yorktown Heights, N.Y
Volume
4
fYear
1979
fDate
28946
Firstpage
262
Lastpage
265
Abstract
The HEAR acoustic processor combines standard frequency-domain and cycle-synchronous time-domain parameters. Output segments, usually 10 msec. in length, vary dynamically from .1 msec. to over 100 msec. to capture significant events in the underlying acoustic phone structure. Segment labels are determined by matching against a set of about 200 automatically selected prototypes. Some statistics on the fraction of segments correctly labeled (from a choice of 52 labels) and their most likely confusions are included. Speech recognition results obtained using the HEAR acoustic processor in conjunction with the training and decoding procedures of the IBM Research Continuous Speech Recognition mainline system are presented. On a set of 125 test sentences (1010 words) of the "New Raleigh Language" (artificial language, 250 word vocabulary, perplexity 7.27), the sentence recognition rate is 100%. On a set of 10 test sentences (282 words) of the "Laser-1000 Language" (natural language, 1000 word vocabulary, perplexity 21.1), the word recognition rate is 80%. Although it generally is difficult to ascribe errors to specific system components, three classes of errors are observed: 1) the correct word is not hypothesized; therefore acoustic match is not performed, 10.3% words, 2) the correct word is hypothesized but search is pruned prior to the construction of longer phrases including it, 6.4%, 3) the correct word is hypothesized, fully matched, and rejected in favor of an incorrect word, 3.2%. Errors of the third class are comprised exclusively of short function words (e.g. "the", "of", etc.), 2.2%, and deleted commas (realized acoustically by optional interword pauses), 1.0%.
Keywords
Acoustic signal processing; Acoustic testing; Acoustic waves; Decoding; Natural languages; Prototypes; Speech recognition; Statistics; Time domain analysis; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79.
Type
conf
DOI
10.1109/ICASSP.1979.1170636
Filename
1170636
Link To Document