• DocumentCode
    3593601
  • Title

    Complement the comparable corpus obtained from websites

  • Author

    Youliang, Zhou ; Zhengxian, Gong ; Guodong, Zhou

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
  • Volume
    1
  • fYear
    2010
  • Abstract
    This paper proposes a method to automatically extract high quality phrase translation tuples from web corpora, and discuss the automatic way to complement the lost part of the bilingual corpora for the first time. It analyzes the features of bilingual translation pairs in web pages, and then a statistical discriminative model combined with multiple features is used to extract translation pairs. Experimental results show that after our experiment, the corpus is aligned well enough for related research.
  • Keywords
    Web sites; linguistics; natural language processing; Chinese information processing; Web corpora; Web page; Web site; bilingual corpora; bilingual translation pairs; high quality phrase translation tuples; natural language processing; statistical discriminative model; Computer science; Data mining; Explosives; Frequency; Information processing; Internet; Natural languages; Paper technology; Trade agreements; Web pages; Chinese information processing; Natural language processing; bilingual translation pairs; complement of bilingual alignment; named entities phrase;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Future Computer and Communication (ICFCC), 2010 2nd International Conference on
  • Print_ISBN
    978-1-4244-5821-9
  • Type

    conf

  • DOI
    10.1109/ICFCC.2010.5497762
  • Filename
    5497762