• DocumentCode
    2261229
  • Title

    Using genetic algorithm for Persian grammar induction

  • Author

    Arabsorkhi, Mohsen ; Faili, Hesham ; Jahroumi, Mansoor Zolghadri

  • Author_Institution
    Comput. Eng. Dept., Islamic Azad Univ. of Saveh, Saveh, Iran
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Most of efficient computational approaches in NLP tasks are supervised methods which need annotated corpora. But the lack of supervised data in Persian encourages researchers to increase their interests and efforts on unsupervised and semi-supervised approaches. This paper presents a novel semi-supervised approach which called Genetic-based inside-outside (GIO), for Persian grammar inference for inducing a grammar model in a PCFG formalism. GIO is an extension of the inside-outside algorithm enriched by some notions of genetic algorithm. In pure genetic algorithm for grammar induction, randomly generated initial population make it computationally expensive, so we used inside-outside algorithm to generate initial population. Our experiments show that our approach´s result is better than other applied methods for Persian grammar induction.
  • Keywords
    computational linguistics; genetic algorithms; grammars; natural language processing; PCFG formalism; Persian grammar induction; Persian grammar inference; genetic algorithm; genetic-based inside-outside; initial population random generation; natural language processing; semi-supervised approach; Computer science; Genetic algorithms; Genetic engineering; Hidden Markov models; Induction generators; Inference algorithms; Iterative methods; Natural languages; Statistical analysis; Tagging; Grammar induction; Persian grammar; genetic algorithm; inside-outside algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313851
  • Filename
    5313851