• DocumentCode
    3317686
  • Title

    Sentence alignment using hybrid model

  • Author

    Fattah, Mohamed Abdel ; Ren, Fuji ; Kuroiwa, Shingo

  • Author_Institution
    Fac. of Eng., Tokushima Univ., Japan
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    388
  • Lastpage
    392
  • Abstract
    Parallel corpora have become an essential resource for work in multilingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to aligning sentences in bilingual parallel corpora based on the text character length between successive punctuates. A probabilistic score is assigned to each proposed correspondence of texts, based on the scaled difference of lengths of the two texts (in characters) and the variance of this difference. Using this score, the time required for punctuates matching decreased and the sentence alignment precision increased. Using this new approach, we could achieve 21.8% improvement over length based approach when applied on English-Arabic parallel documents.
  • Keywords
    language translation; linguistics; natural languages; statistical analysis; bilingual parallel corpora; cross language information retrieval; machine translation; multilingual natural language processing; sentence aligned parallel corpora; Dictionaries; Dolphins; Information retrieval; Natural language processing; Natural languages; Performance analysis; Rivers; Seals; Terminology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598768
  • Filename
    1598768