مرکز منطقه ای اطلاع رساني علوم و فناوري - Large vocabulary speech recognition with multispan statistical language models

DocumentCode :

1290650

Title :

Large vocabulary speech recognition with multispan statistical language models

Author :

Bellegarda, Jerome R.

Author_Institution :

Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA

Volume :

Issue :

fYear :

2000

fDate :

1/1/2000 12:00:00 AM

Firstpage :

Lastpage :

Abstract :

Multispan language modeling refers to the integration of various constraints, both local and global, present in the language. It was recently proposed to capture global constraints through the use of latent semantic analysis, while taking local constraints into account via the usual n-gram approach. This has led to several families of data-driven, multispan language models for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the multispan performance, as measured by perplexity, has been shown to compare favorably with the corresponding n-gram performance. The objective of this work is to characterize the behavior of such multispan modeling in actual recognition. Major implementation issues are addressed, including search integration and context scope selection. Experiments are conducted on a subset of the Wall Street Journal (WSJ) speaker-independent, 20000-word vocabulary, continuous speech task. Results show that, compared to standard n-gram, the multispan framework can lead to a reduction in average word error rate of over 20%. The paper concludes with a discussion of intrinsic multi-span tradeoffs, such as the influence of training data selection on the resulting performance

Keywords :

computational linguistics; natural languages; speech recognition; statistical analysis; vocabulary; Wall Street Journal; context scope selection; data-driven multispan language models; global constraints; large vocabulary speech recognition; latent semantic analysis; multispan statistical language models; n-gram approach; search integration; speaker-independent; word error rate; Character recognition; Data mining; Error analysis; Glass; Natural languages; Predictive models; Speech analysis; Speech recognition; Training data; Vocabulary;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.817455

Filename :

817455

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1290650