• DocumentCode
    2349130
  • Title

    Improving Chinese-English patent machine translation using sentence segmentation

  • Author

    Jin, Yaohong ; Liu, Zhiying

  • Author_Institution
    Inst. of Chinese Inf. Process., Beijing Normal Univ., Beijing, China
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper presents a method using sentence segmentation to improve the performance of Chinese-English patent machine translation. In this method, long Chinese sentence was segmented into separated short sentences using some features from the Hierarchical Network of Concepts theory (HNC theory). Some semantic features are introduced, including main verb of CSC (Eg), main verb of CSP (Egp), long NPs and conjunctions. The main purpose of segmentation algorithm is to detect if one CSC can or cannot be a separate sentence. The segmentation method was integrated with a rule-base MT system. The sequence of these short translations was adjusted and the different ways of expressions in both Chinese and English languages also were in consideration. From the result of the experiments, we can see that the performance of the Chinese-English patent translation was improved effectively. Our method had been integrated into an online patent MT system running in SIPO.
  • Keywords
    language translation; natural language processing; patents; word processing; Chinese-English patent machine translation; HNC theory; Hierarchical Network of Concepts theory; SIPO; online patent machine translation system; rule base machine translation system; sentence segmentation; short sentence translations sequence; Buildings; Google; Machine Translation; long NP; main verb; semantic features; sentence segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587855
  • Filename
    5587855