The recent commercial availability of Intel SGX (Software Guard eXtensions) provides a hardware-enabled building block for secure execution of software modules in an untrusted cloud. As an untrusted hypervisor/OS has ...
详细信息
ISBN:
(纸本)9781538605431
The recent commercial availability of Intel SGX (Software Guard eXtensions) provides a hardware-enabled building block for secure execution of software modules in an untrusted cloud. As an untrusted hypervisor/OS has no access to an enclave's running states, a VM (virtual machine) with enclaves running inside loses the capability of live migration, a key feature of VMs in the cloud. This paper presents the first study on the support for live migration of SGX-capable VMs. We identify the security properties that a secure enclave migration process should meet and propose a software-based solution. We leverage several techniques such as two-phase checkpointing and self-destroy to implement our design on a real SGX machine. Security analysis confirms the security of our proposed design and performance evaluation shows that it incurs negligible performance overhead. Besides, we give suggestions on the future hardware design for supporting transparent enclave migration.
In the Internet of Things, it is important to detect the various relations among objects for mining useful knowledge. Existing works on relation detection are based on centralized processing, which is not suitable for...
详细信息
Cost-effective adaptive routing has a significant impact on overall performance for high-radix hierarchical topologies, such as Dragonfly, which achieve a lower network diameter than traditional topologies, Torus and ...
详细信息
ISBN:
(纸本)9781538632086
Cost-effective adaptive routing has a significant impact on overall performance for high-radix hierarchical topologies, such as Dragonfly, which achieve a lower network diameter than traditional topologies, Torus and Fat tree, but exhibit a lower degree of adaptiveness for shortest-path rout- ing. Existing adaptive routing methods for those hierarchical topologies improve the adaptiveness by increasing path length, i.e. local or global adaptive routing, and thus suffer from complex and costly deadlock avoidance. This work aims to maximize the routing adaptiveness at the minimum cost of deadlock avoidance. We propose a label routing method for high-radix hierarchical networks. This label routing utilizes a co-design methodology and coordinates the two pipelines, input queue and routing computation, in the router microarchitec- ture. Packets in the input buffer are labeled by our routing algorithm depending on network states. We reorganize the input buffer and develop a label routing algorithm, named Green-Red Routing, GRR. GRR relaxes the requirement of using virtual channels to eliminate routing deadlock, and mitigates buffer resources dedicated to deadlock avoidance. GRR manages the buffer resources and balance its utilization elaborately, and achieve fully adaptive routing efficiently. We conduct extensive experiments to evaluate the performance of GRR on Dragonfly and compare it with state-of-the-art works. The results show that GRR achieves 10%-35% higher performance than existing routing algorithms under most traffic patterns.
Interconnection network plays an essential role in the architecture of large-scale high performance computing (HPC) systems. In the paper, we construct a novel family of lowdiameter topologies, Galaxyfly, using techni...
详细信息
—Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication tec...
详细信息
—Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use post-processing deduplication running in system idle time to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services or applications for the following two reasons: Firstly, the temporal locality of duplicate data writes may not exist in some primary storage workloads thus inline caching often fails to achieve good deduplication ratio. Secondly, the post-processing deduplication allows duplicate data to be written into disks, therefore does not provide the benefit of I/O deduplication and requires high peak storage capacity. This paper presents HPDedup, a Hybrid Prioritized data Deduplication mechanism to deal with the storage system shared by applications running in co-located virtual machines or containers by fusing an inline and a post-processing process for exact deduplication. In the inline deduplication phase, HPDedup gives a fingerprint caching mechanism that estimates the temporal locality of duplicates in data streams from different VMs or applications and prioritizes the cache allocation for these streams based on the estimation. HPDedup also allows different deduplication threshold for streams based on their spatial locality to reduce the disk fragmentation. The post-processing phase removes duplicates whose fingerprints are not able to be cached due to weak temporal locality from disks. The hybrid deduplication mechanism significantly reduces the amount of redundant data written to the storage system while maintaining inline data writing performance. Our experimental results show that HPDedup clearly outperforms the state-of-the-art primary storage deduplication techniques in terms of inline
In-memory key/value store (KV-store) is a key building block for many systems like databases and large websites. Two key requirements for such systems are efficiency and availability, which demand a KV-store to contin...
ISBN:
(纸本)9781931971287
In-memory key/value store (KV-store) is a key building block for many systems like databases and large websites. Two key requirements for such systems are efficiency and availability, which demand a KV-store to continuously handle millions of requests per second. A common approach to availability is using replication such as primary-backup (PBR), which, however, requires M + 1 times memory to tolerate M failures. This renders scarce memory unable to handle useful user *** paper makes the first case of building highly available in-memory KV-store by integrating erasure coding to achieve memory efficiency, while not notably degrading performance. A main challenge is that an in-memory KV-store has much scattered metadata. A single KV put may cause excessive coding operations and parity updates due to numerous small updates to metadata. Our approach, namely Cocytus, addresses this challenge by using a hybrid scheme that leverages PBR for small-sized and scattered data (e.g., metadata and key), while only applying erasure coding to relatively large data (e.g., value). To mitigate well-known issues like lengthy recovery of erasure coding, Cocytus uses an online recovery scheme by leveraging the replicated metadata information to continuously serving KV requests. We have applied Cocytus to Memcached. Evaluation using YCSB with different KV configurations shows that Cocytus incurs low overhead for latency and throughput, can tolerate node failures with fast online recovery, yet saves 33% to 46% memory compared to PBR when tolerating two failures.
In this research, we apply the Green's theory for converting the partial differential equation to the boundary integral equation for geometric transformation. Green's theory is designed specifically for integr...
详细信息
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led ...
详细信息
ISBN:
(纸本)9781450332057
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. This paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes. Copyright 2015 ACM.
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are ...
详细信息
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.
The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpecte...
详细信息
The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance.
暂无评论