Title of article :
Data cloning: Data visualisation, smoothing, confidentiality, and encryption
Author/Authors :
Haslett، نويسنده , , S.J. and Govindaraju، نويسنده , , K.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Pages :
13
From page :
410
To page :
422
Abstract :
One simple way to change data for simple linear regression and still get the same fitted parameters is to add each of the residuals from the first model fit to each original observation. For n initial data points (x, y) this creates n2 observations. More generally, adding {ai: i=1,…,m} to each observation produces mn new observations with the same simple linear regression fit, provided the sum over i of the ai is zero. An alternative method, after mean adjustment, is to regress y on x and x on y, and use the predicted values ŷ and x ˆ as new data; the regression for ŷ on x ˆ , and for y on x are identical as are the correlations between x and y, and x ˆ and ŷ. The underlying principle can be extended to simple linear regression with intercept, to multiple linear regression, and to situations where the design matrix is not full rank and/or the data are not independent and identically distributed. For multiple linear regression, the procedure can be repeated many times, each time producing a new dataset with the same multiple linear regression fit as the original data. We call these datasets “cloned” or “matching”. One major advantage of such datasets is that, unlike the more usual model-based alternatives, parameter estimates of the original data and the cloned data are identical and include no model error. Data cloning consequently has potential uses in a wide range of applications from confidentialising or encrypting data, to data visualisation and smoothing. The encryption application is particularly interesting because it can be applied generally to databases even where there is no interest in regression modelling.
Keywords :
Changed data , Multiple regression , Unaltered model fit
Journal title :
Journal of Statistical Planning and Inference
Serial Year :
2012
Journal title :
Journal of Statistical Planning and Inference
Record number :
2221744
Link To Document :
بازگشت