Title :
Text compression using several Huffman trees
Author_Institution :
Sch. of Comput & Syst. Sci., Jawaharlal Nehru Univ., New Delhi, India
Abstract :
Summary form only given. Noticing that a Huffman code is independent of the order in which the characters appear in the text, the author views the source-text as columns of characters where words appear as rows. The character frequency tables are compiled with respect to character positions within the column of words, ie., a frequency table is constructed for the first character of all the words, and another for the second character, and so on. Word delimiter (e.g. a space or punctuation mark) are used as word endings. Position dependent code tables, one for each column, are computed using the Huffman algorithm. The text is scanned from left to right and the code substituted for the characters from the corresponding tables; a delimiter serves to initialize (reset) the correspondence. Using several coding trees one obtains a greater degree of compression without performing clustering computations
Keywords :
codes; data compression; trees (mathematics); Huffman code; Huffman trees; character frequency tables; coding trees; delimiter; text compression; Clustering algorithms; Decoding; Encoding; Frequency; Greedy algorithms; Heuristic algorithms; Huffman coding; Information theory; Partitioning algorithms; Performance analysis;
Conference_Titel :
Data Compression Conference, 1991. DCC '91.
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-9202-2
DOI :
10.1109/DCC.1991.213309