DocumentCode
3593601
Title
Complement the comparable corpus obtained from websites
Author
Youliang, Zhou ; Zhengxian, Gong ; Guodong, Zhou
Author_Institution
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
Volume
1
fYear
2010
Abstract
This paper proposes a method to automatically extract high quality phrase translation tuples from web corpora, and discuss the automatic way to complement the lost part of the bilingual corpora for the first time. It analyzes the features of bilingual translation pairs in web pages, and then a statistical discriminative model combined with multiple features is used to extract translation pairs. Experimental results show that after our experiment, the corpus is aligned well enough for related research.
Keywords
Web sites; linguistics; natural language processing; Chinese information processing; Web corpora; Web page; Web site; bilingual corpora; bilingual translation pairs; high quality phrase translation tuples; natural language processing; statistical discriminative model; Computer science; Data mining; Explosives; Frequency; Information processing; Internet; Natural languages; Paper technology; Trade agreements; Web pages; Chinese information processing; Natural language processing; bilingual translation pairs; complement of bilingual alignment; named entities phrase;
fLanguage
English
Publisher
ieee
Conference_Titel
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Print_ISBN
978-1-4244-5821-9
Type
conf
DOI
10.1109/ICFCC.2010.5497762
Filename
5497762
Link To Document