• DocumentCode
    1908657
  • Title

    Findings and Considerations in Active Learning Based Framework for Resource-Poor SMT

  • Author

    Jinhua Du ; Meng Zhang

  • Author_Institution
    Sch. of Autom. & Inf. Eng., Xi´an Univ. of Technol., Xi´an, China
  • fYear
    2013
  • fDate
    17-19 Aug. 2013
  • Firstpage
    95
  • Lastpage
    98
  • Abstract
    Active learning (AL) for resource-poor SMT is an efficient and feasible way to acquire a number of high-quality parallel data to improve translation quality. This paper firstly studies two mainstream sentence selection algorithms that are Geom-phrase and Geom n-gram, and then proposes a sentence perplexity based selection method. Some important findings, such as the impact of sentence length on the AL performance, are observed in the comparison experiments conducted on Chinese-English NIST data. Accordingly, a preprocessing strategy is presented to filter the original monolingual corpus for the purpose of obtaining higher-information sentences. Experimental results on preprocessed data show that the the performance of three selection algorithms is significantly improved compared to the results on the original data.
  • Keywords
    language translation; learning (artificial intelligence); natural language processing; AL performance; Chinese-English NIST data; Geom n-gram; Geom-phrase; active learning based framework; high-quality parallel data; higher-information sentences; mainstream sentence selection algorithms; monolingual corpus; preprocessed data; preprocessing strategy; resource-poor SMT; sentence perplexity based selection method; translation quality; Algorithm design and analysis; Educational institutions; Measurement; NIST; Probability; System performance; Training; Active Learn-ing; Data Preprocessing; High-information Sentence Selection; Statistical Machine Translation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2013 International Conference on
  • Conference_Location
    Urumqi
  • Type

    conf

  • DOI
    10.1109/IALP.2013.28
  • Filename
    6646012