In local differential privacy(LDP), a challenging problem is the ability to generate highdimensional data while efficiently capturing the correlation between attributes in a dataset. Existing solutions for low-dimensi...
详细信息
In local differential privacy(LDP), a challenging problem is the ability to generate highdimensional data while efficiently capturing the correlation between attributes in a dataset. Existing solutions for low-dimensional data synthesis, which partition the privacy budget among all attributes, cease to be effective in high-dimensional scenarios due to the large-scale noise and communication cost caused by the high dimension. In fact, the high-dimensional characteristics not only bring challenges but also make it possible to apply some technologies to break this bottleneck. This paper presents Sam Priv Syn for high-dimensional data synthesis under LDP, which is composed of a marginal sampling module and a data generation *** marginal sampling module is used to sample from the original data to obtain two-way marginals. The sampling process is based on mutual information, which is updated iteratively to retain, as much as possible,the correlation between attributes. The data generation module is used to reconstruct the synthetic dataset from the sampled two-way marginals. Furthermore, this study conducted comparison experiments on the real-world datasets to demonstrate the effectiveness and efficiency of the proposed method, with results proving that Sam PrivSyn can not only protect privacy but also retain the correlation information between the attributes.
With the advancement of intelligence and networked automotive, the domain-centralized architecture, which employs time sensitive networking (TSN) as the inter-domain backbone network and control area network with flex...
详细信息
Petri Nets(PNs)are used for modeling and analyzing discreteevent systems,such as communication protocols,trafficsystems,human-computer interaction,and fault ***’state space explosion problem means that the state spac...
详细信息
Petri Nets(PNs)are used for modeling and analyzing discreteevent systems,such as communication protocols,trafficsystems,human-computer interaction,and fault ***’state space explosion problem means that the state spaceof PNs grows exponentially with PNs’*** thefundamental reachability problem is still an NP-Hard problemin *** has been proved that the equivalence problem forthe reachability sets of arbitrary PNs is undecidable except forsome subclass of PNs[1].That is,the reachability problem ofarbitrary PNs cannot be solved ***,there is noefficient and accurate algorithm to solve the problem.10172In recent years,with the emergence of big data and thedevelopment of computing hardware,a series ofbreakthroughs have been achieved in machine learning,suchas AlphaGo,AlphaFold,and ChatGPT[2−4].As a data-drivenapproach,machine learning can learn potential mappingrelationships between inputs and outputs from large-scaledata.
The weighted sampling methods based on k-nearest neighbors have been demonstrated to be effective in solving the class imbalance problem. However,they usually ignore the positional relationship between a sample and th...
详细信息
The weighted sampling methods based on k-nearest neighbors have been demonstrated to be effective in solving the class imbalance problem. However,they usually ignore the positional relationship between a sample and the heterogeneous samples in its neighborhood when calculating sample weight. This paper proposes a novel neighborhood-weighted based sampling method named NWBBagging to improve the Bagging algorithm's performance on imbalanced datasets. It considers the positional relationship between the center sample and the heterogeneous samples in its neighborhood when identifying critical samples. And a parameter reduction method is proposed and combined into the ensemble learning framework, which reduces the parameters and increases the classifier's diversity. We compare NWBBagging with some state-of-the-art ensemble learning algorithms on 34 imbalanced datasets, and the result shows that NWBBagging achieves better performance.
As an important subject of natural language generation, Controllable Text Generation (CTG) focuses on integrating additional constraints and controls while generating texts and has attracted a lot of attention. Existi...
详细信息
Multi-Span Question Answering (MSQA) requires models to extract one or multiple answer spans from a given context to answer a question. Prior work mainly focuses on designing specific methods or applying heuristic str...
详细信息
Heterogeneous fraud detection is an important means of credit card security assurance, which can utilize historical transaction records in a source and target domain to build an effective fraud detection model. Nevert...
详细信息
Heterogeneous fraud detection is an important means of credit card security assurance, which can utilize historical transaction records in a source and target domain to build an effective fraud detection model. Nevertheless, large feature distribution differences between source and target transaction instances and the complex intrinsic structure hidden behind transaction data make it difficult for existing credit card fraud detection (CCFD) models to capture and transfer the most informative feature representations and seriously hinder detection performance. In this work, we propose a novel adaptive heterogeneous CCFD model named adaptive heterogeneous credit card fraud detection model based on deep reinforcement training subset selection (RTAHC) based on deep reinforcement training subset selection, which mainly contains two components: selection distribution generator (SDG) and transaction fraud detector (TFD, including feature extractor with an attention mechanism and classifier). The SDG can generate the selection probability distribution vector via the reinforcement reward mechanism, and then transaction instances in the source domain relevant to the target domain are selected. The feature extractor with an attention mechanism can learn the abstract deep semantic feature representations of selected source transaction instances and the target domain. The joint training of SDG and TFD can provide more real-time and accurate transaction feature representations to reduce the distribution discrepancy between the two domains. We verify the detection performance of RTAHC across a large real-world credit card transaction dataset and four public datasets. Experimental results demonstrate that the RTAHC model can exhibit competitive CCFD performance. Impact Statement—With the rise of artificial intelligence (AI)generated models, credit card fraud has become increasingly rampant, which also causes tens of billions of U.S. dollars in credit card losses worldwide every year
Automatic image colorization has made tremendous progress in recent years. However, previous methods rarely consider the inherent diversity of natural images, and existing diverse colorization approaches still suffer ...
详细信息
The precise prediction of multi-scale traffic is a ubiquitous challenge in the urbanization process for car owners, road administrators, and governments. In the case of complex road networks, current and past traffic ...
详细信息
An efficient method of representing and retrieving information is an essential component of open domain QA. There are question and answer models that allow for real-time responses with speed benefit and scalability. N...
详细信息
暂无评论