Title :
Improvements in English ASR for the MALACH project using syllable-centric models
Author :
Sethy, Abhinav ; Ramabhadran, Bhuvana ; Narayanan, Shrikanth
Author_Institution :
Human Language Technol., IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
fDate :
30 Nov.-3 Dec. 2003
Abstract :
LVCSR systems have traditionally used phones as the basic acoustic unit for recognition. Syllable and other longer length units provide an efficient means for modeling long-term temporal dependencies in speech that are difficult to capture in a phone based recognition framework. However, it is well known that longer duration units suffer from training data sparsity problems since a large number of units in the lexicon will have little or no acoustic training data. Previous research has shown that syllable-based modeling provides improvements over word internal systems, but performance has lagged behind cross-word context-dependent systems. In this paper, we describe a syllable-centric approach to English LVCSR for the MALACH (Multilingual Access to Large spoken Archives) project. The combined modeling of syllables and context-dependent phones provides a 0.5% absolute improvement in recognition accuracy over the state-of-the-art cross-word system for the heavily accented and spontaneous speech seen in oral history archives. More importantly, we report on the importance of the improved recognition of names and concepts that is crucial for subsequent search and retrieval.
Keywords :
speech processing; speech recognition; English ASR; LVCSR systems; MALACH project; Multilingual Access to Large spoken Archives; concept recognition; context-dependent phones; heavily accented speech; name recognition; oral history archives; performance; retrieval; search; spontaneous speech; syllable-based modeling; syllable-centric models; Automatic speech recognition; Context modeling; Decision trees; History; Humans; Natural languages; Speech recognition; Training data; Vectors; Vocabulary;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
DOI :
10.1109/ASRU.2003.1318416