DocumentCode :
3320456
Title :
The Acquisition and Sentence Alignment for Academic Bilingual Resources Based on Web Paper Libraries
Author :
Sun, Yueheng ; Men, Rui ; Ni, Weijie
Author_Institution :
Sch. of Comput. Sci. & Technol., Tianjin Univ., Tianjin, China
fYear :
2009
fDate :
28-29 Dec. 2009
Firstpage :
45
Lastpage :
48
Abstract :
This paper presents an approach for acquiring academic bilingual resources from the Web paper libraries. By analyzing the structured information of Web pages, we first implement a customized crawler to download these pages including paper details, and then use a parser to transfer them into XML format. Based on the classic statistical method for sentence alignment, we propose an improved approach to align the initial bilingual resources, in which two factors, bilingual keyword pairs and matching patterns are introduced. Experimental results show that our sentence aligner supported by the new approach achieves performance enhancement by 7% in both precision and recall.
Keywords :
Internet; XML; grammars; natural language processing; statistical analysis; Web pages; Web paper libraries; XML; academic bilingual resources; bilingual keyword pair; customized crawler; matching pattern; sentence alignment; statistical method; Abstracts; Computer science; Crawlers; Information analysis; Libraries; Paper technology; Pattern matching; Uniform resource locators; Web pages; XML; academic bilingual resources; bilingual keyword pairs; matching patterns; sentence alignment; web paper libraries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research Challenges in Computer Science, 2009. ICRCCS '09. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3927-0
Electronic_ISBN :
978-1-4244-5410-5
Type :
conf
DOI :
10.1109/ICRCCS.2009.20
Filename :
5401281
Link To Document :
بازگشت