DocumentCode :
353653
Title :
Variable word rate N-grams
Author :
Gotoh, Yoshihiko ; Renals, Steve
Author_Institution :
Dept. of Comput. Sci., Sheffield Univ., UK
Volume :
3
fYear :
2000
fDate :
2000
Firstpage :
1591
Abstract :
The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional N-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or N-grams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%
Keywords :
Poisson distribution; natural languages; smoothing methods; speech processing; speech recognition; Broadcast News task; Poisson distribution; conventional N-gram language models; discounting schemes; modelling; perplexity reduction; relative frequencies of words; smoothing schemes; variable word rate N-grams; variable word rate assumption; Broadcasting; Computer science; Entropy; Frequency estimation; Information retrieval; Interpolation; Natural languages; Predictive models; Smoothing methods; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
ISSN :
1520-6149
Print_ISBN :
0-7803-6293-4
Type :
conf
DOI :
10.1109/ICASSP.2000.861992
Filename :
861992
Link To Document :
بازگشت