• DocumentCode
    1594314
  • Title

    Implementation of Buckwalter transliteration to Malay corpora

  • Author

    Abu Bakar, Juhaida ; Omar, K. ; Nasrudin, Mohammad Faidzul ; Murah, Mohd Zamri ; Ahmad, Che Wan Shamsul Bahri C-W

  • Author_Institution
    Sch. of Comput., Univ. Utara Malaysia, Sintok, Malaysia
  • fYear
    2013
  • Firstpage
    213
  • Lastpage
    218
  • Abstract
    Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.
  • Keywords
    learning (artificial intelligence); natural language processing; text analysis; vocabulary; Al-Quran; Arabic language; Buckwalter symbols; Buckwalter transliteration; Jawi; Malay corpora; NLP algorithm; OOV; POS-tags; automated text analysis; lexical category assignment; machine learning; natural language processing algorithm; out-of-vocabulary problem; rule learning; Encoding; Wheels; Buckwalter transliteration; Corpora; Encoding; Jawi script; POS-tags;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications (ISDA), 2013 13th International Conference on
  • Conference_Location
    Bangi
  • Print_ISBN
    978-1-4799-3515-4
  • Type

    conf

  • DOI
    10.1109/ISDA.2013.6920737
  • Filename
    6920737