• DocumentCode
    3530726
  • Title

    Extensions of absolute discounting (Kneser-Ney method)

  • Author

    Andrés-Ferrer, Jesús ; Ney, H.

  • Author_Institution
    Univ. Politec. de Valencia, Valencia
  • fYear
    2009
  • fDate
    19-24 April 2009
  • Firstpage
    4729
  • Lastpage
    4732
  • Abstract
    The problem of estimating the parameters of an n-gram language model is a typical problem of estimating small probabilities. So far, two methods have been proposed and used to handle this problem: 1. the empirical Bayes method resulting in the Turing-Good estimates. Theses estimates do not have any constraints and tend to be very noisy. 2. discounting models like absolute (or linear) discounting. The discounting models are heavily constrained and typically have only a single free parameter. Both methods can be formulated in a leaving-one-out framework. In this paper, we study methods that lie between these two extremes. We design models with various types of constraints and derive efficient algorithms for estimating the parameters of these models. We propose two novel types of constraints or models: interval constraints and the exact extended Kneser-Ney model. The proposed methods are implemented and applied to language modelling in order to compare the methods in terms of perplexities. The results show that the new constrained methods outperform other unconstrained methods.
  • Keywords
    Bayes methods; computational linguistics; Bayes method; Kneser-Ney method; Turing-Good estimates; absolute discounting; n-gram language model; Algorithm design and analysis; Bayesian methods; Parameter estimation; Proposals; Smoothing methods; Training data; Kneser-Ney smoothing; language modelling; language smoothing; leaving one out;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
  • Conference_Location
    Taipei
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-2353-8
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2009.4960687
  • Filename
    4960687