DocumentCode
3317686
Title
Sentence alignment using hybrid model
Author
Fattah, Mohamed Abdel ; Ren, Fuji ; Kuroiwa, Shingo
Author_Institution
Fac. of Eng., Tokushima Univ., Japan
fYear
2005
fDate
30 Oct.-1 Nov. 2005
Firstpage
388
Lastpage
392
Abstract
Parallel corpora have become an essential resource for work in multilingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to aligning sentences in bilingual parallel corpora based on the text character length between successive punctuates. A probabilistic score is assigned to each proposed correspondence of texts, based on the scaled difference of lengths of the two texts (in characters) and the variance of this difference. Using this score, the time required for punctuates matching decreased and the sentence alignment precision increased. Using this new approach, we could achieve 21.8% improvement over length based approach when applied on English-Arabic parallel documents.
Keywords
language translation; linguistics; natural languages; statistical analysis; bilingual parallel corpora; cross language information retrieval; machine translation; multilingual natural language processing; sentence aligned parallel corpora; Dictionaries; Dolphins; Information retrieval; Natural language processing; Natural languages; Performance analysis; Rivers; Seals; Terminology;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN
0-7803-9361-9
Type
conf
DOI
10.1109/NLPKE.2005.1598768
Filename
1598768
Link To Document