Often microdata sets contain attributes which are neither numerical nor ordinal, but take nominal values from a taxonomy, ontology or classification (e.g. diagnosis in a medical data set about patients, economic activ...
详细信息
Large data collection organizations such as the Census Bureau often publish statistics to the public in the form of statisticaldatabases. These databases are often transformed to some extent, omitting sensitive infor...
详细信息
ISBN:
(纸本)9781880843864
Large data collection organizations such as the Census Bureau often publish statistics to the public in the form of statisticaldatabases. These databases are often transformed to some extent, omitting sensitive information such as Personal Identifying Information (PII). On the other hand entities that collect vast amounts of data such as the Census Bureau, Centers for Disease Control (CDC), academic institutions, and health organizations -to name a few- have to publish and share collected data with both the public and researchers, taking into consideration privacy concerns and staying in compliance with data privacy laws such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA). Data collection organizations are also tasked with finding the optimal balance between privacy and utility needs of data being published. Therefore the necessity to develop software applications that address such data privacy concerns is enormous. This paper, proposes an implementation of an Online Students Health Record System application with data de-identification and access control capabilities in compliance to HIPAA rules, while at the same time, realizing query efficiency and optimization.
In this paper, we define a novel setting for query auditing, where instead of detecting or preventing the disclosure of individual sensitive values, we want to detect or prevent the disclosure of aggregate values in t...
详细信息
The proceedings contain 58 papers. The topics discussed include: Vietnamese diacritics restoration as sequential tagging;a framework to combine multiple matchers for pair-wise schema matching;linguistically motivated ...
ISBN:
(纸本)9781467303088
The proceedings contain 58 papers. The topics discussed include: Vietnamese diacritics restoration as sequential tagging;a framework to combine multiple matchers for pair-wise schema matching;linguistically motivated and ontological features for Vietnamese named entity recognition;a knowledge-driven educational decision support system;E-learning adoption and assimilation in SMEs: a research framework;an enhanced scheme for privacy-preserving association rules mining on horizontally distributed databases;extraction of discriminative patterns from skeleton sequences for human action recognition;dynamic time-linkage problems - the challenges;algorithms for testing of codes and lozenge-codes;mining association rules restricted on constraint;an efficient incremental mining approach based on IT-tree;and combining statistical machine learning with transformation rule learning for Vietnamese word sense disambiguation.
There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is...
详细信息
ISBN:
(纸本)9783642286414
There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks. In the first attack, the adversary successfully identifies the individual about whom she possesses auxiliary information (an isolation attack). In the second attack, the adversary learns additional information about the individual, although she may not be able to uniquely identify him (an information amplification attack). We demonstrate the applicability of the analytical results by empirically verifying that the mathematical properties assumed of the database are actually true for a significant fraction of the records in the Netflix movie ratings database, which contains ratings from about 500,000 users.
The proceedings contain 25 papers. The topics discussed include: privacy disclosure analysis and control for 2D contingency tables containing inaccurate data;a tool for analyzing and fixing infeasible RCTA instances;b...
ISBN:
(纸本)3642158374
The proceedings contain 25 papers. The topics discussed include: privacy disclosure analysis and control for 2D contingency tables containing inaccurate data;a tool for analyzing and fixing infeasible RCTA instances;branch-and-cut versus cut-and-branch algorithms for cell suppression;data swapping for protecting census tables;eliminating small cells from census counts tables: some considerations on transition probabilities;uncertainty for anonymity and 2-dimensional range query distortion;pram optimization using an evolutionary algorithm;multiplicative noise protocols;measurement error and statistical disclosure control;semantic microaggregation for the anonymization of query logs;data environment analysis and the key variable mapping system;using support vector machines for generating synthetic datasets;synthetic data for small area estimation;and differential privacy and the risk-utility tradeoff for multi-dimensional contingency tables.
Summary form only given. Participatory sensing applications collect data from participants to construct statistical information of environment or phenomenon, using their mobile phone. Mobile phone is closely related t...
详细信息
Summary form only given. Participatory sensing applications collect data from participants to construct statistical information of environment or phenomenon, using their mobile phone. Mobile phone is closely related to participant's daily life, therefore the invasion of privacy in participatory sensing would have dire consequences. In this research, we study privacy-preserving participatory sensing technique which is the perturbation using negative surveys and limited negative surveys on mobile to promote use of participatory sensing in healthcare, investigation, and other useful applications. When participants report the data in negative surveys, their mobile phones automatically select a value from the set complement of the sensed data value at random. In other words, we can construct public statistics without knowing the personal information of citizens. Additionally, our research extends negative surveys to limited negative surveys, which have the capable of change, according to the feature of data, especially the number of categories. We combine negative survey and limited negative survey, because it is difficult to construct valuable databases when the categories of perturbed is large size on mobile phone. We also present the evaluation of these schemes on the view point of privacy and utility of data sets in central collection server.
The proceedings contain 27 papers. The topics discussed include: using a mathematical programming modeling language for optimal CTA;a data quality and data confidentiality assessment of complementary cell suppression;...
ISBN:
(纸本)3540874704
The proceedings contain 27 papers. The topics discussed include: using a mathematical programming modeling language for optimal CTA;a data quality and data confidentiality assessment of complementary cell suppression;pre-processing optimisation applied to the classical integer programming model for statistical disclosure control;how to make the τ-ARGUS modular applicable to linked tables;Bayesian assessment of rounding-based disclosure control;cell bounds in two-way contingency tables based on conditional frequencies;invariant post-tabular protection of census frequency counts;a practical approach to balancing data confidentiality and research needs: the NHIS linked mortality files;and accounting for intruder uncertainty due to sampling when estimating identification disclosure risks in partially synthetic data.
The paper discusses the challenges linked to the need of the research community to have access to microdata files for scientific purposes. These needs have to be adequately balanced with the legal requirement of prese...
详细信息
暂无评论