Title :
Proposed Myanmar Word Tokenizer based on LIPIDIPIKAR treatise
Author :
Thwin, Thein Than ; Win, Aye Thida ; Wai, Phyo Phyo ; Thwin, Mie Mie Su
Author_Institution :
Univ. of Comput. Studies, Mandalay, Myanmar
Abstract :
Natural Language Processing (NLP) based technologies are now becoming important and future intelligent systems will use more of these techniques as the technology is improving explosively. But Asia becomes a dense area in NLP field because of linguistic diversity. Many Asian languages are inadequately supported on computers. Myanmar language is an analytic language but it includes special character like killer, medial, etc.. In English or European languages, all of the syllables are formed by combining the alphabets that represent only consonants and vowels but Myanmar language uses compound syllables that make more difficult to analyze. So we can face difficulties in word sorting. In our proposed system, the condensed form of Myanmar ordinary scripts will be transformed into analyzable elaborated scripts based on LIPIDIPIKAR treatise written by Yaw Min Gyi U Pho Hlaing. These elaborated words can be easily sorted by using this treatise. In our proposed system, complexity of Myanmar condensed words sorting compared with complexity of elaborated words sorting.
Keywords :
natural language processing; Asian languages; English; European languages; LIPIDIPIKAR treatise; Myanmar ordinary scripts; Myanmar word tokenizer; intelligent systems; linguistic diversity; natural language processing; Asia; Databases; Diversity reception; Intelligent systems; Natural language processing; Natural languages; Sorting; Speech synthesis; Transducers; Writing; Condensed form; Elaborated form Introduction; NLP; Phonetic token; Unicode;
Conference_Titel :
Computer Engineering and Technology (ICCET), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6347-3
DOI :
10.1109/ICCET.2010.5485313