Title :
Electronic Medical Records privacy preservation through k-anonymity clustering method
Author :
Moonshik Shin ; Sunyong Yoo ; Lee, K.H. ; Doheon Lee
Author_Institution :
Dept. of Bio & Brain Eng., KAIST, Daejeon, South Korea
Abstract :
Electronic Medical Records (EMRs) enable the sharing of patient medical data whenever it is needed and also are used as a tool for building new medical technology and patient recommendation systems. Since EMRs include patients´ private data, access is restricted to researchers. Thus, an anonymizing technique is necessary that keeps patients´ private data safe while not damaging useful medical information. k-member clustering anonymization approaches k-anonymization as a clustering issue. The objective of the k-member clustering problem is to gather records that will minimize the data distortion during data generalization. Most of the previous clustering techniques include random seed selection. However, randomly selecting a cluster seed will provide inconsistent performance. The authors propose a k-member cluster seed selection algorithm (KMCSSA) that is distinct from the previous clustering methods. Instead of randomly selecting a cluster seed, the proposed method selects the seed based on the closeness centrality to provide consistent information loss (IL) and to reduce the information distortion. An adult database from University of California Irvine Machine Learning Repository was used for the experiment. By comparing the proposed method with two previous methods, the experimental results shows that KMCSSA provides superior performance with respect to information loss. The authors provide a privacy protection algorithm that derives consistent information loss and reduces the overall information distortion.
Keywords :
data privacy; database management systems; learning (artificial intelligence); medical information systems; pattern clustering; EMR; KMCSSA; University of California Irvine Machine Learning Repository; adult database; closeness centrality; data distortion; data generalization; electronic medical record privacy preservation; information distortion; information loss; k-anonymity clustering method; k-member cluster seed selection algorithm; k-member clustering anonymization approach k-anonymization; medical information; medical technology; patient medical data sharing; patient private data; patient recommendation systems; privacy protection algorithm; random seed selection; Closeness Centrality; Information Loss; Seed selection algorithm; k-anonymity; k-member clustering anonymization;
Conference_Titel :
Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on
Conference_Location :
Kobe
Print_ISBN :
978-1-4673-2742-8
DOI :
10.1109/SCIS-ISIS.2012.6505046