• DocumentCode
    3012188
  • Title

    On the design of an effective corpus for evaluation of Bengali Text Compression Schemes

  • Author

    Islam, Md Rafiqul ; Rajon, S. A Ahsan

  • Author_Institution
    Comput. Sci. & Eng. Discipline, Khulna Univ., Khulna
  • fYear
    2008
  • fDate
    24-27 Dec. 2008
  • Firstpage
    236
  • Lastpage
    241
  • Abstract
    In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. We perform a methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes, which is the first initiative in the context of Bengali text compression. To design the Bengali text compression corpus, we consider type to token ratio as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. The proposed corpus is effective for evaluating compression efficiency of small and middle sized text files.
  • Keywords
    data compression; natural languages; text analysis; Bengali text compression scheme evaluation; corpus design; mathematical analysis; type-to-token ratio; Computer science; Costs; Data compression; Design engineering; Dictionaries; Image coding; Information technology; Mathematical analysis; Performance analysis; Performance evaluation; Bengali Text; Bengali Text Compression; Compression Efficiency; Corpus; Data Management; Dictionary Coding; Evaluation Platform; Type to Token Ratio (TTR);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
  • Conference_Location
    Khulna
  • Print_ISBN
    978-1-4244-2135-0
  • Electronic_ISBN
    978-1-4244-2136-7
  • Type

    conf

  • DOI
    10.1109/ICCITECHN.2008.4802992
  • Filename
    4802992