Title : 
An Indexed Bottom-up Approach for Publishing Anonymized Data
         
        
            Author : 
Anh-Tu Hoang ; Minh-Triet Tran ; Anh-Duc Duong ; Echizen, Isao
         
        
            Author_Institution : 
Univ. of Sci., Ho Chi Minh City, Vietnam
         
        
        
        
        
        
            Abstract : 
Sharing information is one of the most important parts of social activities. However, sharing information can leak users´ information. Removing all direct identifiers is not enough. Sweeney proposed an approach that applying k-anonymity to protect users´ identities from linking attack. Sweeney`s algorithm finds out the optimal anonymized dataset through minimal distortion metric. Other authors proposed other optimal algorithms but their proposals are still impractical due to their high computational cost. Another approach is to release the minimal anonymized dataset by applying some heuristics. Wang and Fung proposed Bottom-up Generalization and Top-down Specialization (TDS) to publish a minimal anonymized dataset with information loss metric, whose performance is more efficient. However, these algorithms still have some limitations. In this paper, we propose an algorithm to publish anonymized datasets through bottom-up generalization approach and information loss data metric. Our algorithm can save time by storing statistical information for later usage. The experimental results is performanced on Adult dataset, which is used in all former algorithms. Experimental results show that our algorithm can process 949,662 records dataset in 42.219s. Classification error on anonymized data, which is created by our algorithm, is lower than Wang´s algorithm 3.8%.
         
        
            Keywords : 
electronic publishing; peer-to-peer computing; security of data; statistical analysis; Sweeney`s algorithm; TDS; Wang algorithm; adult dataset; anonymized data Classification error; anonymized data publishing; fung proposed bottom-up generalization; high computational cost; indexed bottom-up approach; information loss data metric; information sharing; k-anonymity; linking attack; minimal distortion metric; optimal anonymized dataset; social activities; statistical information storage; top-down specialization; users identities protection; Data models; Data privacy; Measurement; Partitioning algorithms; Publishing; Taxonomy; Training; Bottom-up; anonymized data; k-anonymity;
         
        
        
        
            Conference_Titel : 
Computational Intelligence and Security (CIS), 2012 Eighth International Conference on
         
        
            Conference_Location : 
Guangzhou
         
        
            Print_ISBN : 
978-1-4673-4725-9
         
        
        
            DOI : 
10.1109/CIS.2012.148