• DocumentCode
    1773309
  • Title

    Developing an efficient algorithm for representation and compression of large Bengali text

  • Author

    Marjan, Md Abu ; Uddin, Md Palash ; Ibn Afjal, Masud ; Haque, Md Dulal

  • Author_Institution
    Fac. of Comput. Sci. & Eng., Hajee Mohammad Danesh Sci. & Technol. Univ., Dinajpur, Bangladesh
  • fYear
    2014
  • fDate
    21-23 Oct. 2014
  • Firstpage
    22
  • Lastpage
    25
  • Abstract
    Efficient coding is one of the challenging aspects of information and communication theory. On the other hand, the natural languages such as Bengali is coded using Unicode technology which requires more space and thus takes more time to transfer the data of that language. In this paper, we have proposed a novel algorithm to represent Bengali text efficiently and then to compress the text offering a better compression ratio. Each Bengali character is represented by a unique 2-digit intermediate decimal value. Indexing and sorting all the word values successive subtraction is performed on the values in hope to reduce the weight of the numbers. The new values of each word can now be encoded with a very few bits. In comparison to other compressors, the compression ratio of the proposed algorithm decreases in a big amount for the large text which may contain more duplicate or redundant words, more words with the same length and more words of the same length with the same prefix called Uposorgo in Bengali.
  • Keywords
    data compression; indexing; information theory; natural language processing; sorting; text analysis; 2-digit intermediate decimal value; communication theory; efficient coding; indexing; information theory; large Bengali text compression; large Bengali text representation; natural languages; sorting; unicode technology; Compounds; Compressors; Computers; Encoding; Indexes; Sorting; Standards; Bengali text compression; Bengali text representation; Compression; Decompression; compression ratio;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Strategic Technology (IFOST), 2014 9th International Forum on
  • Conference_Location
    Cox´s Bazar
  • Print_ISBN
    978-1-4799-6060-6
  • Type

    conf

  • DOI
    10.1109/IFOST.2014.6991063
  • Filename
    6991063