DocumentCode :
3034984
Title :
Performance statistics of the HEAR acoustic processor
Author :
Baker, Janet MacIver
Author_Institution :
IBM T.J. Watson Research Center, Yorktown Heights, N.Y
Volume :
4
fYear :
1979
fDate :
28946
Firstpage :
262
Lastpage :
265
Abstract :
The HEAR acoustic processor combines standard frequency-domain and cycle-synchronous time-domain parameters. Output segments, usually 10 msec. in length, vary dynamically from .1 msec. to over 100 msec. to capture significant events in the underlying acoustic phone structure. Segment labels are determined by matching against a set of about 200 automatically selected prototypes. Some statistics on the fraction of segments correctly labeled (from a choice of 52 labels) and their most likely confusions are included. Speech recognition results obtained using the HEAR acoustic processor in conjunction with the training and decoding procedures of the IBM Research Continuous Speech Recognition mainline system are presented. On a set of 125 test sentences (1010 words) of the "New Raleigh Language" (artificial language, 250 word vocabulary, perplexity 7.27), the sentence recognition rate is 100%. On a set of 10 test sentences (282 words) of the "Laser-1000 Language" (natural language, 1000 word vocabulary, perplexity 21.1), the word recognition rate is 80%. Although it generally is difficult to ascribe errors to specific system components, three classes of errors are observed: 1) the correct word is not hypothesized; therefore acoustic match is not performed, 10.3% words, 2) the correct word is hypothesized but search is pruned prior to the construction of longer phrases including it, 6.4%, 3) the correct word is hypothesized, fully matched, and rejected in favor of an incorrect word, 3.2%. Errors of the third class are comprised exclusively of short function words (e.g. "the", "of", etc.), 2.2%, and deleted commas (realized acoustically by optional interword pauses), 1.0%.
Keywords :
Acoustic signal processing; Acoustic testing; Acoustic waves; Decoding; Natural languages; Prototypes; Speech recognition; Statistics; Time domain analysis; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79.
Type :
conf
DOI :
10.1109/ICASSP.1979.1170636
Filename :
1170636
Link To Document :
بازگشت