Title :
Implementation of Buckwalter transliteration to Malay corpora
Author :
Abu Bakar, Juhaida ; Omar, K. ; Nasrudin, Mohammad Faidzul ; Murah, Mohd Zamri ; Ahmad, Che Wan Shamsul Bahri C-W
Author_Institution :
Sch. of Comput., Univ. Utara Malaysia, Sintok, Malaysia
Abstract :
Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.
Keywords :
learning (artificial intelligence); natural language processing; text analysis; vocabulary; Al-Quran; Arabic language; Buckwalter symbols; Buckwalter transliteration; Jawi; Malay corpora; NLP algorithm; OOV; POS-tags; automated text analysis; lexical category assignment; machine learning; natural language processing algorithm; out-of-vocabulary problem; rule learning; Encoding; Wheels; Buckwalter transliteration; Corpora; Encoding; Jawi script; POS-tags;
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2013 13th International Conference on
Conference_Location :
Bangi
Print_ISBN :
978-1-4799-3515-4
DOI :
10.1109/ISDA.2013.6920737