• DocumentCode
    497024
  • Title

    Chinese-Uighur Sentence Alignment Based on Hybrid Strategy with Mistake Spread Suppression

  • Author

    Tian, Shengwei ; Ibrahim, Turgun ; Umal, Hasan ; Yu, Long

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Xinjiang Universtiy, Urumqi, China
  • Volume
    2
  • fYear
    2009
  • fDate
    4-5 July 2009
  • Firstpage
    683
  • Lastpage
    688
  • Abstract
    This paper proposes a hybrid algorithm based on mistake spread suppression to align Chinese-Uighur sentences. Aiming at the shortcoming of mistake spread in alignment algorithm based on length, this paper presents a new kind of suppression strategy for mistake spread. This strategy omits Chinese segmentation and processing for post tagging. By using characteristics of punctuation, sentence length and Chinese-Uighur correspondence information,the anchor points with 1:1 pattern sentence pairs are identified to suppress mistakes spread. Among anchor points, a hybrid strategy based on both length and punctuation is used to align sentences. Experimental results verified the high precision of identifying anchor points and the effective restraint of the spread of alignment mistakes; Hybrid alignment algorithm avoids the weakness of high time complexity alignment algorithms based on word. In addition, its performance is improved more compare with traditional alignment algorithms, and alignment mistake ratio is reduced from 4.8% to 2.3%.
  • Keywords
    computational complexity; natural language processing; Chinese-Uighur sentence alignment; bilingual corpora; hybrid alignment algorithm; mistake spread suppression; pattern sentence pair; time complexity alignment algorithm; Computational Intelligence Society; Data mining; Dictionaries; Educational institutions; Information retrieval; Information science; Large-scale systems; Paper technology; Pattern matching; Tagging; Bilingual Corpora; Hybrid Strategy; Mistake Spread Suppression; Sentence Alignment;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Environmental Science and Information Application Technology, 2009. ESIAT 2009. International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-0-7695-3682-8
  • Type

    conf

  • DOI
    10.1109/ESIAT.2009.208
  • Filename
    5199985