• DocumentCode
    2121997
  • Title

    A Disc-based Approach to Data Summarization and Privacy Preservation

  • Author

    Ge, Rong ; Ester, Martin ; Jin, Wen ; Hu, Zengjian

  • Author_Institution
    Simon Fraser Univ., Burnaby, BC
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    321
  • Lastpage
    332
  • Abstract
    Data summarization has been recognized as a fundamental operation in database systems and data mining with important applications such as data compression and privacy preservation. While the existing methods such as CF-values and DataBubbles may perform reasonably well, they cannot provide any guarantees on the quality of their results. In this paper, we introduce a summarization approach for numerical data based on discs formalizing the notion of quality. Our objective is to find a minimal set of discs, i.e. spheres satisfying a radius and a significance constraint, covering the given dataset. Since the proposed problem is NP-complete, we design two different approximation algorithms. These algorithms have a quality guarantee, but they do not scale well to large databases. However, the machinery from approximation algorithms allows a precise characterization of a further, heuristic algorithm. This heuristic, efficient algorithm exploits multi-dimensional index structures and can be well-integrated with database systems. The experiments show that our heuristic algorithm generates summaries that outperform the state-of-the-art data bubbles in terms of internal measures as well as in terms of external measures when using the data summaries as input for clustering methods
  • Keywords
    computational complexity; data compression; data mining; data privacy; database indexing; disc storage; pattern clustering; security of data; NP-complete problem; approximation algorithm; data bubbles; data clustering; data compression; data mining; data privacy preservation; data summarization; database system; disc-based approach; heuristic algorithm; multidimensional index structure; Algorithm design and analysis; Approximation algorithms; Clustering algorithms; Data compression; Data mining; Data privacy; Database systems; Heuristic algorithms; Indexes; Machinery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scientific and Statistical Database Management, 2006. 18th International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1551-6393
  • Print_ISBN
    0-7695-2590-3
  • Type

    conf

  • DOI
    10.1109/SSDBM.2006.6
  • Filename
    1644329