• DocumentCode
    394266
  • Title

    Semantic n-gram language modeling with the latent maximum entropy principle

  • Author

    Shaojun Wang ; Schuurmans, Dale ; Peng, Fuchun ; Zhao, Yunxin

  • Author_Institution
    Sch. of Comput. Sci., Waterloo Univ., Ont., Canada
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    We describe a unified probabilistic framework for statistical language modeling-the latent maximum entropy principle-which can effectively incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Unlike previous work on maximum entropy methods for language modeling, which only allow explicit features to be modeled, our framework also allows relationships over hidden features to be captured, resulting in a more expressive language model. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then present experimental results for our approach on the Wall Street Journal corpus.
  • Keywords
    grammars; natural languages; speech recognition; statistical analysis; Wall Street Journal corpus; efficient algorithms; hidden features; inference; latent maximum entropy; local word interaction; marginal; maximum entropy methods; normalization; semantic document information; semantic n-gram language modeling; semantic smoothing; statistical language modeling; syntactic structure; unified probabilistic framework; Biomedical optical imaging; Computer science; Entropy; Humans; Inference algorithms; Information retrieval; Natural languages; Optical character recognition software; Probability; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198796
  • Filename
    1198796