In recent years, national statistical organizations have increasingly relied on synthetic data when releasing microdata containing sensitive personal or establishment information. This paper deals with the challenges ...
详细信息
ISBN:
(数字)9783031696510
ISBN:
(纸本)9783031696503;9783031696510
In recent years, national statistical organizations have increasingly relied on synthetic data when releasing microdata containing sensitive personal or establishment information. This paper deals with the challenges of using synthetic data to protect the privacy of survey respondents. For this type of data it is often important to consider the survey design information when creating the synthesis models. The paper discusses two techniques that can be used for generating survey microdata under informative sampling. Specifically, it examines an approach that combines design-based and model-based methods through the use of the pseudo-likelihood approach within the sequentialregression framework. As far as we are aware, the pseudo-likelihood method has not been used in the context of sequentialregression synthesis before. This method is compared with another approach in which design variables are included as predictors in the regression models. In the latter approach, the survey weights have to be synthesized and included in the final data product, while the former generates synthetic simple random samples that are representative of the original population without weights.
暂无评论