DocumentCode :
477923
Title :
A Research on Length Based Sentence Alignment for Chinese-English Parallel Corpus
Author :
Zan, Hongying ; Zhang, Xia ; Fan, Ming
Author_Institution :
Coll. of Inf. & Eng., Zhengzhou Univ., Zhengzhou
Volume :
4
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
145
Lastpage :
149
Abstract :
Many existing length based Chinese-English sentence alignment methods compute sentence length in terms of the number of bytes. In this paper, we examine the effectiveness of six different ways of sentence length computation, which take, respectively, the number of verbs, nouns, adjectives, content words, bytes and all words in a sentence as its length. Most previous methods are found memory consuming and inefficient. This paper proposes an alignment method to save memory and time via grouping sentence for alignment. Our experimental results show that taking all words into account in the sentence length computation can further enhance alignment performance, giving 99.01% precision and 99.5% recall, respectively.
Keywords :
natural language processing; Chinese-English parallel corpus; length based sentence alignment; Concurrent computing; Dictionaries; Educational institutions; Fuzzy systems; Heuristic algorithms; Knowledge engineering; Large scale integration; Natural languages; Performance analysis; Terminology; NLP; parallel corpus; sentence alignment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Jinan Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.307
Filename :
4666373
Link To Document :
بازگشت