• DocumentCode
    3034984
  • Title

    Performance statistics of the HEAR acoustic processor

  • Author

    Baker, Janet MacIver

  • Author_Institution
    IBM T.J. Watson Research Center, Yorktown Heights, N.Y
  • Volume
    4
  • fYear
    1979
  • fDate
    28946
  • Firstpage
    262
  • Lastpage
    265
  • Abstract
    The HEAR acoustic processor combines standard frequency-domain and cycle-synchronous time-domain parameters. Output segments, usually 10 msec. in length, vary dynamically from .1 msec. to over 100 msec. to capture significant events in the underlying acoustic phone structure. Segment labels are determined by matching against a set of about 200 automatically selected prototypes. Some statistics on the fraction of segments correctly labeled (from a choice of 52 labels) and their most likely confusions are included. Speech recognition results obtained using the HEAR acoustic processor in conjunction with the training and decoding procedures of the IBM Research Continuous Speech Recognition mainline system are presented. On a set of 125 test sentences (1010 words) of the "New Raleigh Language" (artificial language, 250 word vocabulary, perplexity 7.27), the sentence recognition rate is 100%. On a set of 10 test sentences (282 words) of the "Laser-1000 Language" (natural language, 1000 word vocabulary, perplexity 21.1), the word recognition rate is 80%. Although it generally is difficult to ascribe errors to specific system components, three classes of errors are observed: 1) the correct word is not hypothesized; therefore acoustic match is not performed, 10.3% words, 2) the correct word is hypothesized but search is pruned prior to the construction of longer phrases including it, 6.4%, 3) the correct word is hypothesized, fully matched, and rejected in favor of an incorrect word, 3.2%. Errors of the third class are comprised exclusively of short function words (e.g. "the", "of", etc.), 2.2%, and deleted commas (realized acoustically by optional interword pauses), 1.0%.
  • Keywords
    Acoustic signal processing; Acoustic testing; Acoustic waves; Decoding; Natural languages; Prototypes; Speech recognition; Statistics; Time domain analysis; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79.
  • Type

    conf

  • DOI
    10.1109/ICASSP.1979.1170636
  • Filename
    1170636