• DocumentCode
    3141990
  • Title

    Improved word alignment in patent domain

  • Author

    Li, Zezhong ; Ikeda, Hideto ; Hung, Nguyen Thanh ; Huang, Degen

  • Author_Institution
    Dept. of Comput. Sci., Ritsumeikan Univ., Kusatsu, Japan
  • fYear
    2011
  • fDate
    27-29 Nov. 2011
  • Firstpage
    209
  • Lastpage
    213
  • Abstract
    This paper presents a new method for word alignment in patent domain which incorporates both generative and discriminative models. In this framework, the advantage of generative model that can learn large numbers of parameters from a sentence-aligned parallel corpus automatically in a unsupervised way can be kept, as well as get an improvement through discriminative models which can deploy various features in a supervised way. Even with only 300 word-aligned Chinese-English sentence pairs, incorporates with a 1M parallel Chinese-English patent sentences released by NTCIR9, experiments show that our method can get a promising performance.
  • Keywords
    language translation; natural language processing; patents; text analysis; unsupervised learning; NTCIR9; discriminative model; generative model; machine translation; parallel Chinese-English patent sentence; patent domain; sentence-aligned parallel corpus; unsupervised learning; word alignment; word-aligned Chinese-English sentence pair; Computational modeling; Computer science; Entropy; Hidden Markov models; Patents; Training; Viterbi algorithm; machine translation; parallel corpus; patent; word alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
  • Conference_Location
    Tokushima
  • Print_ISBN
    978-1-61284-729-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2011.6138196
  • Filename
    6138196