Title :
Data matrix compression by using co-clustering
Author :
Bo Han ; Zhenyu Yang
Author_Institution :
Int. Sch. of Software, Wuhan Univ., Wuhan, China
Abstract :
A two dimensional data matrix has been widely used in many applications. The lossless compression of data matrix not only brings benefits for storage but also for network transmission. In this paper, we propose a novel data-mining-based compression approach consisting of three steps: reordering and grouping data matrix columns and rows by co-clustering; post-processing to further expose redundancy in data matrix; data compression by a standard compressor. The inverse transform of co-clustering is very fast and simple, which facilitates matrix uncompression. We tested the approach on a synthetic dataset and five UCI real-life datasets. The experimental results suggest that our approach can improve compression rates at least 24% and up to 68%. The results also show that the time cost of the approach is linearly proportional to data matrix size, which is faster than other competition methods.
Keywords :
data compression; data mining; inverse transforms; pattern clustering; data matrix column grouping; data matrix compression; data matrix redundancy; data matrix row coclustering; data-mining-based compression approach; inverse transform; lossless compression; matrix uncompression; network transmission; reordering; standard compressor; two dimensional data matrix; Data compression; Data mining; Educational institutions; Image coding; Redundancy; Software; Transforms; co-clustering; data matrix; lossless compression; redundancy;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019940