DocumentCode :
3593601
Title :
Complement the comparable corpus obtained from websites
Author :
Youliang, Zhou ; Zhengxian, Gong ; Guodong, Zhou
Author_Institution :
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
Volume :
1
fYear :
2010
Abstract :
This paper proposes a method to automatically extract high quality phrase translation tuples from web corpora, and discuss the automatic way to complement the lost part of the bilingual corpora for the first time. It analyzes the features of bilingual translation pairs in web pages, and then a statistical discriminative model combined with multiple features is used to extract translation pairs. Experimental results show that after our experiment, the corpus is aligned well enough for related research.
Keywords :
Web sites; linguistics; natural language processing; Chinese information processing; Web corpora; Web page; Web site; bilingual corpora; bilingual translation pairs; high quality phrase translation tuples; natural language processing; statistical discriminative model; Computer science; Data mining; Explosives; Frequency; Information processing; Internet; Natural languages; Paper technology; Trade agreements; Web pages; Chinese information processing; Natural language processing; bilingual translation pairs; complement of bilingual alignment; named entities phrase;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Print_ISBN :
978-1-4244-5821-9
Type :
conf
DOI :
10.1109/ICFCC.2010.5497762
Filename :
5497762
Link To Document :
بازگشت