مرکز منطقه ای اطلاع رساني علوم و فناوري - Syllable-based large vocabulary continuous speech recognition

DocumentCode :

1467901

Title :

Syllable-based large vocabulary continuous speech recognition

Author :

Ganapathiraju, Aravind ; Hamaker, Jonathan ; Picone, Joseph ; Ordowski, Mark ; Doddington, George R.

Author_Institution :

Conversay, Redmond, WA, USA

Volume :

Issue :

fYear :

2001

fDate :

5/1/2001 12:00:00 AM

Firstpage :

358

Lastpage :

366

Abstract :

Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions

Keywords :

computational complexity; speech recognition; OGI Alphadigits; Switchboard evaluation; complexity; context-dependent phone; large vocabulary continuous speech recognition; long-term temporal dependencies; phone-level minimal pair distinctions; pronunciation variation; robust LVCSR systems; spontaneous speech; syllable-based speech recognition; syllable-level acoustic unit; telephone-bandwidth speech; triphone system; word error rate; Acoustics; Error analysis; Hidden Markov models; Information processing; Information technology; Laboratories; Robustness; Signal processing; Speech recognition; Vocabulary;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.917681

Filename :

917681

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1467901