• DocumentCode
    3433791
  • Title

    Incorporating information from syllable-length time scales into automatic speech recognition

  • Author

    Wu, Su-Lin ; Kingsbury, Brian E D ; Morgan, Nelson ; Greenberg, Steven

  • Author_Institution
    Int. Comput. Sci. Inst., Berkeley, CA, USA
  • Volume
    2
  • fYear
    1998
  • fDate
    12-15 May 1998
  • Firstpage
    721
  • Abstract
    Including information distributed over intervals of syllabic duration (100-250 ms) may greatly improve the performance of automatic speech recognition (ASR) systems. ASR systems primarily use representations and recognition units covering phonetic durations (40-100 ms). Humans certainly use information at phonetic time scales, but results from psychoacoustics and psycholinguistics highlight the crucial role of the syllable, and syllable-length intervals, in speech perception. We compare the performance of three ASR systems: a baseline system that uses phone-scale representations and units, an experimental system that uses a syllable-oriented front-end representation and syllabic units for recognition, and a third system that combines the phone-scale and syllable-scale recognizers by merging and rescoring N-best lists. Using the combined recognition system, we observed an improvement in word error rate for telephone-bandwidth, continuous numbers from 6.8% to 5.5% on a clean test set, and from 27.8% to 19.6% on a reverberant test set, over the baseline phone-based system
  • Keywords
    decoding; error statistics; feature extraction; pattern classification; signal representation; speech intelligibility; speech processing; speech recognition; 100 to 250 ms; 40 to 100 ms; ASR systems; N-best lists; automatic speech recognition; baseline phone-based system; clean test set; combined recognition system; continuous numbers; experimental system; feature extraction; performance; phone-scale representations; phonetic time scales; psychoacoustics; psycholinguistics; recognition units; reverberant test set; speech decoding; speech intelligibility; speech perception; speech unit classification; syllabic duration; syllable-length time scales; syllable-oriented front-end representation; syllable-scale recognizers; telephone-bandwidth; word error rate; Automatic speech recognition; Computer science; Error analysis; Humans; Merging; Psychoacoustics; Psychology; Speech processing; Speech recognition; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
  • Conference_Location
    Seattle, WA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-4428-6
  • Type

    conf

  • DOI
    10.1109/ICASSP.1998.675366
  • Filename
    675366