The major usage of a distributed block storage integrated with a cloud computing platform is to provide the storage for VM (virtual machine) instances. Traditional desktop and server applications tend to be written wi...
详细信息
ISBN:
(纸本)9781665408790
The major usage of a distributed block storage integrated with a cloud computing platform is to provide the storage for VM (virtual machine) instances. Traditional desktop and server applications tend to be written with small I/O being dominant, and in limited parallelism. Hence the performance of block storage serving these applications migrated to cloud is largely determined by latency of small I/O. This paper presents IndigoStore, an optimized Ceph backend to implement cloud-scale block storage that provides virtual disks for cloud VMs. The design of IndigoStore aims to optimize Ceph BlueStore backend, the state-of-the-art distributed storage backend, to reduce both average and tail latency of small I/O, meanwhile not waste disk bandwidth serving large I/O. We use both microbenchmarks and our production workloads to demonstrate that IndigoStore achieves 29%∼44 % lower average latency, and up to 1.23×lower 99.99 th percentile tail latency than BlueStore, without any notable negative effects on other performance metrics.
We present a design and implementation of distributed sparse block grids that transparently scale from a single CPU to multi-GPU clusters. We support dynamic sparse grids as, e.g., occur in computer graphics with comp...
详细信息
The traditional partial wave analysis (PWA) algorithm is designed to process data serially which requires a large amount of memory that may exceed the memory capacity of one single node to store runtime data. It is qu...
详细信息
The traditional partial wave analysis (PWA) algorithm is designed to process data serially which requires a large amount of memory that may exceed the memory capacity of one single node to store runtime data. It is quite necessary to parallelize this algorithm in a distributed data computing framework to improve its performance. Within an existing production-level Hadoop cluster, we implement PWA algorithm on top of Spark to process data storing on low-level storage system HDFS. But in this case, sharing data through HDFS or internal data communication mechanism of Spark is extremely inefficient. In order to solve this problem, this paper presents an in-memory parallelcomputing method for PWA algorithm. With this system, we can easily share runtime data in parallel algorithms. We can ensure complete data locality to keep compatibility with the traditional data input/output way and cache most repeatedly used data in memory to improve the performance, owe to the data management mechanism of Alluxio.
Cloud-based deep learning (DL) solutions have been widely used in applications ranging from image recognition to speech recognition. Meanwhile, as commercial software and services, such solutions have raised the need ...
详细信息
ISBN:
(纸本)9781728190747
Cloud-based deep learning (DL) solutions have been widely used in applications ranging from image recognition to speech recognition. Meanwhile, as commercial software and services, such solutions have raised the need for intellectual property rights protection of the underlying DL models. Water-marking is the mainstream of existing solutions to address this concern, by primarily embedding pre-defined secrets in a model's training process. However, existing efforts almost exclusively focus on detecting whether a target model is pirated, without considering traitor tracing. In this paper, we present SecureMark_DL, which enables a model owner to embed a unique fingerprint for every customer within parameters of a DL model, extract and verify the fingerprint from a pirated model, and hence trace the rogue customer who illegally distributed his model for profits. We demonstrate that SecureMark_DL is robust against various attacks including fingerprints collusion and network transformation (e.g., model compression and model fine-tuning) Extensive experiments conducted on MNIST and CIFAR10 datasets, as well as various types of deep neural network show the superiority of SecureMark_DL in terms of training accuracy and robustness against various types of attacks.
There has been increasing need for secure data sharing. In practice a group of data owners often adopt a heterogeneous security scheme under which each pair of parties decide their own protocol to share data with dive...
详细信息
ISBN:
(纸本)9781450367356
There has been increasing need for secure data sharing. In practice a group of data owners often adopt a heterogeneous security scheme under which each pair of parties decide their own protocol to share data with diverse levels of trust. The scheme also keeps track of how the data is used. This paper studies distributed SQL query answering in the heterogeneous security setting. We define query plans by incorporating toll functions determined by data sharing agreements and reflected in the use of various security facilities. We formalize query answering as a bi-criteria optimization problem, to minimize both data sharing toll and parallel query evaluation cost. We show that this problem is PSPACE-hard for SQL and Sigma(p)(3)-hard for SPC, and it is in NEXPTIME. Despite the hardness, we develop a set of approximate algorithms to generate distributed query plans that minimize data sharing toll and reduce parallel evaluation cost. Using real-life and synthetic data, we empirically verify the effectiveness, scalability and efficiency of our algorithms.
Smart grids facilitate the use of distributed and renewable resources on the supply side and providing consumers with a range of tailored services on the consumption side. The introduction of energy smart grid in Mala...
详细信息
The industry and academia have proposed many distributed graph processing systems. However, the existing systems are not friendly enough for users like data analysts and algorithm engineers. On the one hand, the progr...
详细信息
ISBN:
(纸本)9781665408790
The industry and academia have proposed many distributed graph processing systems. However, the existing systems are not friendly enough for users like data analysts and algorithm engineers. On the one hand, the programming models and interfaces differ a lot in the existing systems, leading to high learning costs and program migration costs. On the other hand, these graph processing systems are tightly bound to the underlying distributedcomputing platforms, requiring users to be familiar with distributedcomputing. To improve the usability of distributed graph processing, we propose a unified distributed graph programming framework UniGPS. Firstly, we propose a unified cross-platform graph programming model VCProg for UniGPS. VCProg hides details of distributedcomputing from users. It is compatible with the popular graph programming models Pregel, GAS, and Push-Pull. VCProg-based programs can be executed by compatible distributed graph processing systems without modification, reducing the learning overheads of users. Secondly, UniGPS supports Python as the programming language. We propose an interprocess-communication-based execution environment isolation mechanism to enable Java/C++-based graph processing systems to call user-defined methods written in Python. The experimental results show that UniGPS enables users to process big graphs beyond the memory capacity of a single machine without sacrificing usability. UniGPS shows near-linear data scalability and machine scalability.
Artificial intelligence has become the core driving force of a new round of industrial innovation around the world and a strategic technology that leads future development. The application of artificial intelligence t...
详细信息
Recently, interest in the spread of the culture of sharing has been increasing at a social level due to the deepening income polarization, poverty, and the rapid growth of civil society and the importance of public in...
详细信息
ISBN:
(纸本)9781665418430;9781665448260
Recently, interest in the spread of the culture of sharing has been increasing at a social level due to the deepening income polarization, poverty, and the rapid growth of civil society and the importance of public interest activities of private organizations. In 2016, the National Statistical Office conducted a survey on changes in donation attitudes and recognition of donations, items that took up a large part in the area for the spread of the overall donation culture were the convenience of donation methods and the enhancement of transparency of donation organizations. The existing donation system is a way for donors to donate directly to several donor organizations. However, this method has its disadvantage: it is difficult for donors to see how their donations are spent. To address the convenience of such donation methods, based on the platform that is the core of the fourth industrial era, it enables unrestricted interaction in the network environment such as mobile or PC, inducing open participation in talent donation to the general public. Blockchain is a data forgery/modulation prevention technology based on distributedcomputing technology. Because the block chain records continuously changing data on all participating nodes, it is impossible to arbitrarily manipulate the data by the operators of distributed nodes. Blockchain technology is in the spotlight because it can secure transparency in transactions. In this paper, to solve these problems, we try to introduce the block chain technology into the donation system.
Some graph analyses, such as social network and biological network, need large-scale graph construction and maintenance over distributed memory space. distributed data-streaming tools, including MapReduce and Spark, r...
详细信息
ISBN:
(纸本)9781728162515
Some graph analyses, such as social network and biological network, need large-scale graph construction and maintenance over distributed memory space. distributed data-streaming tools, including MapReduce and Spark, restrict some computational freedom of incremental graph modification and run-time graph visualization. Instead, we take an agent-based approach. We construct a graph from a scientific dataset in CSV, tab, and XML formats;dispatch many reactive agents on it;and analyze the graph in the form of their collective group behavior: propagation, flocking, and collision. The key to success is how to automate the run-time construction and visualization of agent-navigable graphs mapped over distributed memory. We implemented this distributed graph-computing support in the multi-agent spatial simulation (MASS) library, coupled with the Cytoscape graph visualization software. This paper presents the MASS implementation techniques and demonstrates its execution performance in comparison to MapReduce and Spark, using two benchmark programs: (1) an incremental construction of a complete graph and (2) a KD tree construction.
暂无评论