DocumentCode
1773309
Title
Developing an efficient algorithm for representation and compression of large Bengali text
Author
Marjan, Md Abu ; Uddin, Md Palash ; Ibn Afjal, Masud ; Haque, Md Dulal
Author_Institution
Fac. of Comput. Sci. & Eng., Hajee Mohammad Danesh Sci. & Technol. Univ., Dinajpur, Bangladesh
fYear
2014
fDate
21-23 Oct. 2014
Firstpage
22
Lastpage
25
Abstract
Efficient coding is one of the challenging aspects of information and communication theory. On the other hand, the natural languages such as Bengali is coded using Unicode technology which requires more space and thus takes more time to transfer the data of that language. In this paper, we have proposed a novel algorithm to represent Bengali text efficiently and then to compress the text offering a better compression ratio. Each Bengali character is represented by a unique 2-digit intermediate decimal value. Indexing and sorting all the word values successive subtraction is performed on the values in hope to reduce the weight of the numbers. The new values of each word can now be encoded with a very few bits. In comparison to other compressors, the compression ratio of the proposed algorithm decreases in a big amount for the large text which may contain more duplicate or redundant words, more words with the same length and more words of the same length with the same prefix called Uposorgo in Bengali.
Keywords
data compression; indexing; information theory; natural language processing; sorting; text analysis; 2-digit intermediate decimal value; communication theory; efficient coding; indexing; information theory; large Bengali text compression; large Bengali text representation; natural languages; sorting; unicode technology; Compounds; Compressors; Computers; Encoding; Indexes; Sorting; Standards; Bengali text compression; Bengali text representation; Compression; Decompression; compression ratio;
fLanguage
English
Publisher
ieee
Conference_Titel
Strategic Technology (IFOST), 2014 9th International Forum on
Conference_Location
Cox´s Bazar
Print_ISBN
978-1-4799-6060-6
Type
conf
DOI
10.1109/IFOST.2014.6991063
Filename
6991063
Link To Document