• DocumentCode
    424245
  • Title

    Interpolated probabilistic tagging model optimized with genetic algorithm

  • Author

    Wong, Fa ; Chao, Sam ; Hu, Dong-Cheng ; Mao, W-Hang

  • Author_Institution
    Fac. of Sci. & Technol., Macao Univ., China
  • Volume
    4
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    2569
  • Abstract
    We present results of probabilistic tagging of Portuguese texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages by using a limited corpus as the basic training source. In order to cope the ambiguities problem caused by the insufficient training data, especially the unknown words, we incorporate the lexical features into the probabilistic model. Different from other proposed tagging models, these features are introduced into the word probabilities by means of interpolation. A technique to determine the optimal set of interpolation parameters based on genetic algorithm is described. Our preliminary result shows that we can correctly tag 91.8% of the sentences based on our tagging model.
  • Keywords
    genetic algorithms; interpolation; probability; text analysis; Portuguese texts; genetic algorithm; interpolated probabilistic tagging model; interpolation; word probability; Chaos; Genetic algorithms; Interpolation; Natural language processing; Natural languages; Probability; Speech; Statistical analysis; Tagging; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1382237
  • Filename
    1382237