Complement the comparable corpus obtained from websites

Author

Youliang, Zhou ; Zhengxian, Gong ; Guodong, Zhou

Author_Institution

Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China

Volume

1

fYear

2010

Abstract

This paper proposes a method to automatically extract high quality phrase translation tuples from web corpora, and discuss the automatic way to complement the lost part of the bilingual corpora for the first time. It analyzes the features of bilingual translation pairs in web pages, and then a statistical discriminative model combined with multiple features is used to extract translation pairs. Experimental results show that after our experiment, the corpus is aligned well enough for related research.

Keywords

Web sites; linguistics; natural language processing; Chinese information processing; Web corpora; Web page; Web site; bilingual corpora; bilingual translation pairs; high quality phrase translation tuples; natural language processing; statistical discriminative model; Computer science; Data mining; Explosives; Frequency; Information processing; Internet; Natural languages; Paper technology; Trade agreements; Web pages; Chinese information processing; Natural language processing; bilingual translation pairs; complement of bilingual alignment; named entities phrase;

fLanguage

English

Publisher

ieee

Conference_Titel

Future Computer and Communication (ICFCC), 2010 2nd International Conference on

Print_ISBN

978-1-4244-5821-9

Type

conf

DOI

10.1109/ICFCC.2010.5497762

Filename

5497762