In privacy-preserving machine learning, individual parties are reluctant to share their sensitive training data due to privacy concerns. Even the trained model parameters or prediction can pose serious privacy leakage...
详细信息
ISBN:
(数字)9781728185262
ISBN:
(纸本)9781728185279
In privacy-preserving machine learning, individual parties are reluctant to share their sensitive training data due to privacy concerns. Even the trained model parameters or prediction can pose serious privacy leakage. To address these problems, we demonstrate a generally applicable distributed Privacy-Preserving Prediction (DPPP) framework, in which instead of sharing more sensitive data or model parameters, an untrusted aggregator combines only multiple models' predictions under provable privacy guarantee. Our framework integrates two main techniques to guarantee individual privacy. First, we introduce the improved Binomial Mechanism and Discrete Gaussian Mechanism to achieve distributed differential privacy. Second, we utilize homomorphic encryption to ensure that the aggregator learns nothing but the noisy aggregated prediction. Experimental results demonstrate that our framework has comparable performance to the non-private frameworks and delivers better results than the local differentially private framework and standalone framework.
In recent years, the rapid-growing scales of graphs have sparked a lot of parallel graph analysis frameworks to leverage the massive hardware resources on CPUs or GPUs. Existing CPU implementations are time-consuming,...
详细信息
In recent years, the rapid-growing scales of graphs have sparked a lot of parallel graph analysis frameworks to leverage the massive hardware resources on CPUs or GPUs. Existing CPU implementations are time-consuming, while GPU implementations are restricted by the memory space and the complexity of programming. In this paper, we present a high performance hybrid CPU-GPU parallel graph analytics framework with good productivity based on GraphMat. We map vertex programs to generalized sparse matrix vector multiplication on GPUs to deliver high performance, and propose a high-level abstraction for developers to implement various graph algorithms with relatively little efforts. Meanwhile, several optimizations have been adopted for reducing the communication cost and leveraging hardware resources, especially the memory hierarchy. We evaluate the proposed framework on three graph primitives(PageRank, BFS and SSSP) with large-scale graphs. The experimental results show that, our implementation achieves an average speedup of 7.0 X than GraphMat on two 6-core Intel Xeon CPUs. It also has the capability to process larger datasets but achieves comparable performance than MapGraph, a state-of-theart GPU-based framework.
To address the increasing need for detecting and validating protein biomarkers in clinical specimens,mass spectrometry(MS)-based targeted proteomic techniques,including the selected reaction monitoring(SRM),parallel r...
详细信息
To address the increasing need for detecting and validating protein biomarkers in clinical specimens,mass spectrometry(MS)-based targeted proteomic techniques,including the selected reaction monitoring(SRM),parallel reaction monitoring(PRM),and massively parallel dataindependent acquisition(DIA),have been *** optimal performance,they require the fragment ion spectra of targeted peptides as prior *** this report,we describe a MS pipeline and spectral resource to support targeted proteomics studies for human tissue *** build the spectral resource,we integrated common open-source MS computational tools to assemble a freely accessible computational workflow based on *** then applied the workflow to generate DPHL,a comprehensive DIA pan-human library,from 1096 data-dependent acquisition(DDA)MS raw files for 16 types of cancer *** extensive spectral resource was then applied to a proteomic study of 17 prostate cancer(PCa)***,PRM validation was applied to a larger study of 57 PCa patients and the differential expression of three proteins in prostate tumor was *** a second application,the DPHL spectral resource was applied to a study consisting of plasma samples from 19 diffuse large B cell lymphoma(DLBCL)patients and 18 healthy control *** expressed proteins between DLBCL patients and healthy control subjects were detected by DIA-MS and confirmed by *** data demonstrate that the DPHL supports DIA and PRM MS pipelines for robust protein biomarker *** is freely accessible at https://***/page/***?id=IPX0001400000.
Uncertainty is a great challenge for environment perception of autonomous robots. For instance, while building semantic maps (i.e., maps with semantic labels such as object names), the robot may encounter unexpected o...
详细信息
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is r...
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.
To provide timely results for big data analytics, it is crucial to satisfy deadline requirements for MapReduce jobs in today's production environments. Much effort has been devoted to the problem of meeting deadlines...
详细信息
To provide timely results for big data analytics, it is crucial to satisfy deadline requirements for MapReduce jobs in today's production environments. Much effort has been devoted to the problem of meeting deadlines, and typically there exist two kinds of solutions. The first is to allocate appropriate resources to complete the entire job before the specified time limit, where missed deadlines result because of tight deadline constraints or lack of resources; the second is to run a pre-constructed sample based on deadline constraints, which can satisfy the time requirement but fail to maximize the volumes of processed data. In this paper, we propose a deadline-oriented task scheduling approach, named 'Dart', to address the above problem. Given a specified deadline and restricted resources, Dart uses an iterative estimation method, which is based on both historical data and job running status to precisely estimate the real-time job completion time. Based on the estimated time, Dart uses an approach-revise algorithm to make dynamic scheduling decisions for meeting deadlines while maximizing the amount of processed data and mitigating stragglers. Dart also efficiently handles task failures and data skew, protecting its performance from being harmed. We have validated our approach using workloads from OpenCloud and Facebook on a cluster of 64 virtual machines. The results show that Dart can not only effectively meet the deadline but also process near-maximum volumes of data even with tight deadlines and limited resources.
Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practi...
详细信息
Hardware-based middleboxes are ubiquitous in computer networks, which usually incur high deployment and management expenses. A recently arsing trend aims to address those problems by outsourcing the functions of tradi...
详细信息
Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-sc...
详细信息
Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (...
详细信息
Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (VMs) have been widely used in iVCE to isolate different users/jobs and ensure trustworthiness, but traditionally VMs require a long period of time for booting, which cannot meet the requirement of iVCE's large-scale and highly dynamic applications. To address this problem, in this paper we design and implement VirtMan, a fast booting system for a large number of virtual machines in iVCE. VirtMan uses the Linux Small computer System Interface (SCSI) target to remotely mount to the source image in a scalable hierarchy, and leverages the homogeneity of a set of VMs to transfer only necessary image data at runtime. We have implemented VirtMan both as a standalone system and for OpenStack. In our 100-server testbed, VirtMan boots up 1000 VMs (with a 15 CB image of Windows Server 2008) on 100 physical servers in less than 120 s, which is three orders of magnitude lower than current public clouds.
暂无评论