• DocumentCode
    3166771
  • Title

    Conditional leaving-one-out and cross-validation for discount estimation in Kneser-Ney-like extensions

  • Author

    Andrés-Ferrer, J. ; Sundermeyer, M. ; Ney, H.

  • Author_Institution
    Pattern Recognition & Human Language Technol., Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    5013
  • Lastpage
    5016
  • Abstract
    The smoothing of n-gram models is a core technique in language modelling (LM). Modified Kneser-Ney (mKN) ranges among one of the best smoothing techniques. This technique discounts a fixed quantity from the observed counts in order to approximate the Turing-Good (TG) counts. Despite the TG counts optimise the leaving-one-out (L1O) criterion, the discounting parameters introduced in mKN do not. Moreover, the approximation to the TG counts for large counts is heavily simplified. In this work, both ideas are addressed: the estimation of the discounting parameters by L1O and better functional forms to approximate larger TG counts. The L1O performance is compared with cross-validation (CV) and mKN baseline in two large vocabulary tasks.
  • Keywords
    natural language processing; Kneser-Ney like extensions; Turing-Good counts; conditional leaving-one-out criterion; cross validation; discount estimation; language modelling; n-gram model; smoothing technique; vocabulary task; Approximation methods; Computational modeling; Estimation; Optimization; Smoothing methods; Standards; Training; Cross Validation; Language Modelling; Leaving-One-Out; modified Kneser-Ney smoothing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
  • Conference_Location
    Kyoto
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4673-0045-2
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2012.6289046
  • Filename
    6289046