Data matrix compression by using co-clustering

Author

Bo Han ; Zhenyu Yang

Author_Institution

Int. Sch. of Software, Wuhan Univ., Wuhan, China

Volume

4

fYear

2011

fDate

26-28 July 2011

Firstpage

2600

Lastpage

2604

Abstract

A two dimensional data matrix has been widely used in many applications. The lossless compression of data matrix not only brings benefits for storage but also for network transmission. In this paper, we propose a novel data-mining-based compression approach consisting of three steps: reordering and grouping data matrix columns and rows by co-clustering; post-processing to further expose redundancy in data matrix; data compression by a standard compressor. The inverse transform of co-clustering is very fast and simple, which facilitates matrix uncompression. We tested the approach on a synthetic dataset and five UCI real-life datasets. The experimental results suggest that our approach can improve compression rates at least 24% and up to 68%. The results also show that the time cost of the approach is linearly proportional to data matrix size, which is faster than other competition methods.

Keywords

data compression; data mining; inverse transforms; pattern clustering; data matrix column grouping; data matrix compression; data matrix redundancy; data matrix row coclustering; data-mining-based compression approach; inverse transform; lossless compression; matrix uncompression; network transmission; reordering; standard compressor; two dimensional data matrix; Data compression; Data mining; Educational institutions; Image coding; Redundancy; Software; Transforms; co-clustering; data matrix; lossless compression; redundancy;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on

Conference_Location

Shanghai

Print_ISBN

978-1-61284-180-9

Type

conf

DOI

10.1109/FSKD.2011.6019940

Filename

6019940