Title :
A new approach to sort Unicode Bengali text
Author :
Rahman, Md Ahsanur ; Sattar, Md Abdus
Author_Institution :
Dept. of CSE, Bangladesh Univ. of Eng. & Technol., Dhaka
Abstract :
Character order in unicode for Bengali is different from the sorting order suggested by the governing authority. As a result, simple letter by letter comparison does not yield correct order of Bengali words. The presence of modifier characters in Bengali made the situation more complicated. The objective of our study is to adapt the suggested collation order for unicode represented Bengali text while achieving maximum possible efficiency. Here we propose an algorithm for this purpose. The proposed algorithm is applicable to any chosen sorting order. Also it compares words in O(1) time, irrespective of their lengths. Thus complexity of sorting texts is always O(n log n).
Keywords :
computational complexity; natural language processing; text analysis; O(n log n); character order; unicode Bengali text; Dictionaries; Natural languages; Sorting; Standardization;
Conference_Titel :
Electrical and Computer Engineering, 2008. ICECE 2008. International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-4244-2014-8
Electronic_ISBN :
978-1-4244-2015-5
DOI :
10.1109/ICECE.2008.4769285