DocumentCode
1594314
Title
Implementation of Buckwalter transliteration to Malay corpora
Author
Abu Bakar, Juhaida ; Omar, K. ; Nasrudin, Mohammad Faidzul ; Murah, Mohd Zamri ; Ahmad, Che Wan Shamsul Bahri C-W
Author_Institution
Sch. of Comput., Univ. Utara Malaysia, Sintok, Malaysia
fYear
2013
Firstpage
213
Lastpage
218
Abstract
Assigning lexical categories to words is an important step in the automated analysis of a text. Modern Natural Language Processing (NLP) algorithms are based on machine learning; learn rules automatically through the analysis of large corpora of typical real world examples. The Buckwalter transliteration has become a standard to be followed in natural language processing research community that works on Arabic. In this paper, we discuss the encoding in Malay language corpus written in Jawi. The purpose of this work is to conform and standardize the corpora between the similar characters. Four different letters with the Arabic language identified and new defined Buckwalter symbols were assigned to the letters. Collections of 114 chapter in al-Quran translated in Jawi has been used as a corpora. The similar corpora between Jawi and Arabic language will be manipulated to determined out-of-vocabulary problem (OOV) in POS-tags.
Keywords
learning (artificial intelligence); natural language processing; text analysis; vocabulary; Al-Quran; Arabic language; Buckwalter symbols; Buckwalter transliteration; Jawi; Malay corpora; NLP algorithm; OOV; POS-tags; automated text analysis; lexical category assignment; machine learning; natural language processing algorithm; out-of-vocabulary problem; rule learning; Encoding; Wheels; Buckwalter transliteration; Corpora; Encoding; Jawi script; POS-tags;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Systems Design and Applications (ISDA), 2013 13th International Conference on
Conference_Location
Bangi
Print_ISBN
978-1-4799-3515-4
Type
conf
DOI
10.1109/ISDA.2013.6920737
Filename
6920737
Link To Document