DocumentCode :
2193050
Title :
Automatic Construction of Web-Based English/Chinese Parallel Corpora
Author :
Tan Bin ; Zhou Xu-yan
Author_Institution :
Dept. of Comput., Jingganshan Univ., Ji´an, China
fYear :
2010
fDate :
2-4 April 2010
Firstpage :
114
Lastpage :
117
Abstract :
As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. A Web-based English-Chinese bilingual parallel corpus of automatic Construction Technology solved the shortage of bilingual English-Chinese Parallel Corpus. First, some web pages which may be set translation dig of from a particular source, and then from the web pages focused on the external characteristics according to the similarity to extract the candidate web pages in parallel pairs, use of content-based methods on parallel web pages for each of these candidates assessed. In the assessment of the candidate pairs of parallel web pages, this paper design ECVS models of bilingual text similarity assessed based on the classic vector space model.
Keywords :
Internet; content-based retrieval; natural language processing; English-Chinese parallel corpora; Web pages; Web-based parallel corpora; automatic construction technology; bilingual text similarity; content-based methods; cross-lingual information retrieval; multilingual corpora; natural language processing; vector space model; Informatics; Information security; Information technology; Jacobi correlation coefficient; Parallel corpora; vector space;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on
Conference_Location :
Jinggangshan
Print_ISBN :
978-1-4244-6730-3
Electronic_ISBN :
978-1-4244-6743-3
Type :
conf
DOI :
10.1109/IITSI.2010.124
Filename :
5453637
Link To Document :
بازگشت