Title of article :
Improving the Accuracy of English-Arabic Statistical Sentence Alignment
Author/Authors :
Salameh, Mohammad Lebanese American University - Department of Computer Science and Mathematics, Lebanon , Zantout, Rached Prince Sultan University - College of Computer and Information Sciences, Saudi Arabia , Mansour, Nashat Lebanese American University - Department of Computer Science and Mathematics, Lebanon
Abstract :
Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, teach in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.
Keywords :
Word alignment , sentence alignment , parallel corpora , and statistical natural language processing
Journal title :
The International Arab Journal of Information Technology (IAJIT)
Journal title :
The International Arab Journal of Information Technology (IAJIT)