DocumentCode
3329397
Title
Morphology based text compression
Author
Göksu, Hayriye ; Diri, Banu
fYear
2010
fDate
22-24 April 2010
Firstpage
45
Lastpage
48
Abstract
With the rapid growth of online information, the number of documents in electronic media is very common increased. Easy and quick access to this information gets more important for the purpose of text compression. In recent years, a portion of the work in the field of text compression covers study aimed to the morphological structure of the language. In this study, Turkish and English documents are compressed in the determination of the different decomposition methods and efficiency, this method has been to investigate the effects of compression. Turkish and English documents are parsed by using morphological structure. The next stage in the parsed document structure is applied to the compression process with Huffman compression method. As a result, created 10 different parsing techniques with which attempts were made on a different corpus.
Keywords
data compression; grammars; natural language processing; text analysis; English document; Huffman compression method; Turkish document; electronic media; morphological structure; morphology based text compression; Computers; Conferences; Data compression; Entropy; Information technology; Markov processes; Morphology;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing and Communications Applications Conference (SIU), 2010 IEEE 18th
Conference_Location
Diyarbakir
Print_ISBN
978-1-4244-9672-3
Type
conf
DOI
10.1109/SIU.2010.5651231
Filename
5651231
Link To Document