Data sets that are subject to statistical Disclosure Limitation (SDL) often have many variables of different types that need to be altered for disclosure limitation. To produce a good quality public data set, the data...
详细信息
ISBN:
(纸本)9783319997704;9783319997711
Data sets that are subject to statistical Disclosure Limitation (SDL) often have many variables of different types that need to be altered for disclosure limitation. To produce a good quality public data set, the data protector needs to account for the relationships between the variables. Hence, ideally SDL methods should not be univariate, that is, treating each variable independently of others, but multivariate, handling many variables at the same time. However, if a data set has many variables, as most government survey data do, the task of developing and implementing a multivariate approach for SDL becomes difficult. In this paper we propose a pre-masking data processing procedure which consists of clustering the variables of high dimensional data sets, so that different groups of variables can be masked independently, thus reducing the complexity of SDL. We consider different hierarchical clustering methods, including our version of hierarchical clustering algorithm, that we call K-Link, and outline how the data protector can define an appropriate number of clusters for these methods. We implemented and applied these methods to two genuine multivariate data sets. The results of the experiments show that K-Link has a potential to solve this problem efficiently. The success of the method, however, depends on the correlation structure of the data. For the data sets where most of the variables are correlated, clustering of variables and subsequent independent application of SDL methods to different clusters may lead to attenuated correlation in the masked data, even for efficient clustering methods. Thereby, the proposed approach is a trade-off between the computational complexity of multivariate SDL methods and data utility loss due to independent treatment of different clusters by SDL methods.
The proceedings contain 28 papers. The special focus in this conference is on Tabular data protection, Microdata masking, Protection using privacy models, Synthetic data, Record linkage, Remote access and privacy-Pres...
ISBN:
(纸本)9783319112565
The proceedings contain 28 papers. The special focus in this conference is on Tabular data protection, Microdata masking, Protection using privacy models, Synthetic data, Record linkage, Remote access and privacy-Preserving protocols. The topics include: Enabling statistical analysis of suppressed tabular data;assessing the information loss of controlled adjustment methods in two-way tables;further developments with perturbation techniques to protect tabular data;comparison of different sensitivity rules for tabular data and presenting a new rule;pre-tabular perturbation with controlled tabular adjustment: some considerations;measuring disclosure risk with entropy in population based frequency tables;A CTA model based on the huber function;density approximant based on noise multiplied data;reverse mapping to preserve the marginal distributions of attributes in masked microdata;JPEG-Based microdata protection;improving the utility of differential privacy via univariate microaggregation;differentially private exponential random graphs;differentially-private logistic regression for detecting Multiple-SNP association in GWAS databases;disclosure risk evaluation for fully synthetic categorical data;v-dispersed synthetic data based on a mixture model with constraints;nonparametric generation of synthetic data for small geographic areas;using partially synthetic data to replace suppression in the business dynamics statistics;synthetic longitudinal business databases for international comparisons;a comparison of blocking methods for record linkage;probabilistic record linkage for disclosure risk assessment;comparison of two remote access systems recently developed and implemented in Australia and towards secure and practical location privacy through private equality testing.
The Australian Bureau of Statistics has developed an additive noise method for automatically and consistently confidentialising tables of counts ‘on the fly’. statistical properties of the perturbation are defined b...
详细信息
Cartographic maps have many practical uses and can be an attractive alternative for disseminating detailed frequency tables. However, a detailed map may disclose private data of individual units of a population. We wi...
详细信息
statisticaldatabases in general and data warehouses in particular are used to analyze large amounts of business data in predefined as well as ad-hoc reports. Operators of statisticaldatabases must ensure that indivi...
详细信息
We define On-Average KL-privacy and present its properties and connections to differential privacy, generalization and informationtheoretic quantities including max-information and mutual information. The new definiti...
详细信息
Witnesses are of the utmost importance in emergency systems since they can trigger timely location-based status alerts. However, their collaboration with the authorities can get impaired for the fear of the people of ...
详细信息
For decades, NSOs have used complementary cell suppression for disclosure limitation of tabular data, magnitude data in particular. Indications of its continued use abound, even though suppression thwarts statistical ...
详细信息
ISBN:
(纸本)9783319112572;9783319112565
For decades, NSOs have used complementary cell suppression for disclosure limitation of tabular data, magnitude data in particular. Indications of its continued use abound, even though suppression thwarts statistical analysis of both the expert and the novice. We introduce methods for creating alternative tables that the NSO can release unsuppressed, while ensuring within statistical certainty that their analysis is conformal with analysis of the original.
In this paper we describe a new procedure that is capable of ensuring that the marginal distributions of attributes in microdata masked with a masking mechanism end up being the same as the marginal distributions of a...
详细信息
ISBN:
(纸本)9783319112572;9783319112565
In this paper we describe a new procedure that is capable of ensuring that the marginal distributions of attributes in microdata masked with a masking mechanism end up being the same as the marginal distributions of attributes in the original data. We illustrate the application of the new procedure using several commonly used masking mechanisms.
暂无评论