• DocumentCode
    3312070
  • Title

    Data matrix compression by using co-clustering

  • Author

    Bo Han ; Zhenyu Yang

  • Author_Institution
    Int. Sch. of Software, Wuhan Univ., Wuhan, China
  • Volume
    4
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    2600
  • Lastpage
    2604
  • Abstract
    A two dimensional data matrix has been widely used in many applications. The lossless compression of data matrix not only brings benefits for storage but also for network transmission. In this paper, we propose a novel data-mining-based compression approach consisting of three steps: reordering and grouping data matrix columns and rows by co-clustering; post-processing to further expose redundancy in data matrix; data compression by a standard compressor. The inverse transform of co-clustering is very fast and simple, which facilitates matrix uncompression. We tested the approach on a synthetic dataset and five UCI real-life datasets. The experimental results suggest that our approach can improve compression rates at least 24% and up to 68%. The results also show that the time cost of the approach is linearly proportional to data matrix size, which is faster than other competition methods.
  • Keywords
    data compression; data mining; inverse transforms; pattern clustering; data matrix column grouping; data matrix compression; data matrix redundancy; data matrix row coclustering; data-mining-based compression approach; inverse transform; lossless compression; matrix uncompression; network transmission; reordering; standard compressor; two dimensional data matrix; Data compression; Data mining; Educational institutions; Image coding; Redundancy; Software; Transforms; co-clustering; data matrix; lossless compression; redundancy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019940
  • Filename
    6019940