• DocumentCode
    3329397
  • Title

    Morphology based text compression

  • Author

    Göksu, Hayriye ; Diri, Banu

  • fYear
    2010
  • fDate
    22-24 April 2010
  • Firstpage
    45
  • Lastpage
    48
  • Abstract
    With the rapid growth of online information, the number of documents in electronic media is very common increased. Easy and quick access to this information gets more important for the purpose of text compression. In recent years, a portion of the work in the field of text compression covers study aimed to the morphological structure of the language. In this study, Turkish and English documents are compressed in the determination of the different decomposition methods and efficiency, this method has been to investigate the effects of compression. Turkish and English documents are parsed by using morphological structure. The next stage in the parsed document structure is applied to the compression process with Huffman compression method. As a result, created 10 different parsing techniques with which attempts were made on a different corpus.
  • Keywords
    data compression; grammars; natural language processing; text analysis; English document; Huffman compression method; Turkish document; electronic media; morphological structure; morphology based text compression; Computers; Conferences; Data compression; Entropy; Information technology; Markov processes; Morphology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communications Applications Conference (SIU), 2010 IEEE 18th
  • Conference_Location
    Diyarbakir
  • Print_ISBN
    978-1-4244-9672-3
  • Type

    conf

  • DOI
    10.1109/SIU.2010.5651231
  • Filename
    5651231