DocumentCode :
1924586
Title :
Generation of Prototypes for Masking Sequences of Events
Author :
Valls, Aida ; Gomez-Alonso, C. ; Torra, Vicenc
Author_Institution :
Dept. Comput. Sci. & Math., Univ. Rovira i Virgili, Tarragona
fYear :
2009
fDate :
16-19 March 2009
Firstpage :
947
Lastpage :
952
Abstract :
Sequences of categorical data are in common use to represent sequences of events. In order to transfer such data to third parties for their analysis, masking methods can be applied to satisfy privacy laws and avoid the disclosure of sensitive information. Masking methods distort the data so that privacy is kept at the expenses of some information loss. %Different methods exist, each one trying to find a good trade-off between the risk of disclosure and the information loss. Microaggregation is one of the existing masking methods. In microaggregation small clusters are automatically built and the values of the members of a cluster are substituted by the values of the prototype of that cluster. Due to the fact that microaggregation is an NP-hard problem, heuristic approaches have been developed. Existing methods are mainly devoted to numerical and categorical data. The extension of these methods to sequences of categorical data requires the definition of special algorithms for clustering and prototyping. Artificial Intelligence offers techniques and tools that are appropriate for symbolic data. As in our context the sequences are defined in terms of categorical (symbolic) values, such AI techniques are of special relevance. In this paper, we will use them to propose a new method for generating the prototype of a small group of sequences of categorical values. These results can later be used in e.g. microaggregation.
Keywords :
artificial intelligence; computational complexity; pattern clustering; security of data; NP-hard problem; artificial intelligence; categorical data sequences; heuristic approach; masking methods; microaggregation; symbolic data; Artificial intelligence; Data analysis; Data mining; Data privacy; Information analysis; Internet; Loss measurement; Proteins; Prototypes; Sequences; Similarity measures; Statistical Disclosure Control; Time Sequence Data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Availability, Reliability and Security, 2009. ARES '09. International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4244-3572-2
Electronic_ISBN :
978-0-7695-3564-7
Type :
conf
DOI :
10.1109/ARES.2009.55
Filename :
5066592
Link To Document :
بازگشت