Title :
Variable word rate N-grams
Author :
Gotoh, Yoshihiko ; Renals, Steve
Author_Institution :
Dept. of Comput. Sci., Sheffield Univ., UK
Abstract :
The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional N-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or N-grams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%
Keywords :
Poisson distribution; natural languages; smoothing methods; speech processing; speech recognition; Broadcast News task; Poisson distribution; conventional N-gram language models; discounting schemes; modelling; perplexity reduction; relative frequencies of words; smoothing schemes; variable word rate N-grams; variable word rate assumption; Broadcasting; Computer science; Entropy; Frequency estimation; Information retrieval; Interpolation; Natural languages; Predictive models; Smoothing methods; Statistics;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.861992