DocumentCode :
2541864
Title :
A new approach to sort Unicode Bengali text
Author :
Rahman, Md Ahsanur ; Sattar, Md Abdus
Author_Institution :
Dept. of CSE, Bangladesh Univ. of Eng. & Technol., Dhaka
fYear :
2008
fDate :
20-22 Dec. 2008
Firstpage :
628
Lastpage :
630
Abstract :
Character order in unicode for Bengali is different from the sorting order suggested by the governing authority. As a result, simple letter by letter comparison does not yield correct order of Bengali words. The presence of modifier characters in Bengali made the situation more complicated. The objective of our study is to adapt the suggested collation order for unicode represented Bengali text while achieving maximum possible efficiency. Here we propose an algorithm for this purpose. The proposed algorithm is applicable to any chosen sorting order. Also it compares words in O(1) time, irrespective of their lengths. Thus complexity of sorting texts is always O(n log n).
Keywords :
computational complexity; natural language processing; text analysis; O(n log n); character order; unicode Bengali text; Dictionaries; Natural languages; Sorting; Standardization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2008. ICECE 2008. International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4244-2014-8
Electronic_ISBN :
978-1-4244-2015-5
Type :
conf
DOI :
10.1109/ICECE.2008.4769285
Filename :
4769285
Link To Document :
بازگشت