Title :
Syllable-based large vocabulary continuous speech recognition
Author :
Ganapathiraju, Aravind ; Hamaker, Jonathan ; Picone, Joseph ; Ordowski, Mark ; Doddington, George R.
Author_Institution :
Conversay, Redmond, WA, USA
fDate :
5/1/2001 12:00:00 AM
Abstract :
Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions
Keywords :
computational complexity; speech recognition; OGI Alphadigits; Switchboard evaluation; complexity; context-dependent phone; large vocabulary continuous speech recognition; long-term temporal dependencies; phone-level minimal pair distinctions; pronunciation variation; robust LVCSR systems; spontaneous speech; syllable-based speech recognition; syllable-level acoustic unit; telephone-bandwidth speech; triphone system; word error rate; Acoustics; Error analysis; Hidden Markov models; Information processing; Information technology; Laboratories; Robustness; Signal processing; Speech recognition; Vocabulary;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on