检索结果-内蒙古大学图书馆

International Conference on Dependable Systems and Networks (DSN)

作者： Jinyu Gu Zhichao Hua Yubin Xia Haibo Chen Binyu Zang Haibing Guan Jinming Li Shanghai Key Laboratory of Scalable Computing and Systems Shanghai Jiao Tong University Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University Huawei Technologies Inc.

ISBN: (纸本)9781538605431

The recent commercial availability of Intel SGX (Software Guard eXtensions) provides a hardware-enabled building block for secure execution of software modules in an untrusted cloud. As an untrusted hypervisor/OS has no access to an enclave's running states, a VM (virtual machine) with enclaves running inside loses the capability of live migration, a key feature of VMs in the cloud. This paper presents the first study on the support for live migration of SGX-capable VMs. We identify the security properties that a secure enclave migration process should meet and propose a software-based solution. We leverage several techniques such as two-phase checkpointing and self-destroy to implement our design on a real SGX machine. Security analysis confirms the security of our proposed design and performance evaluation shows that it incurs negligible performance overhead. Besides, we give suggestions on the future hardware design for supporting transparent enclave migration.

关键词： Program processors Hardware Security Virtual machine monitors Cloud computing Context

来源：评论

学校读者我要写书评

暂无评论

A distributed Relation Detection Approach in the Internet of Things

引用

Mobile Information Systems 2017年第1期2017卷

作者： Zhu, Weiping Lu, Hongliang Cui, Xiaohui Cao, Jiannong International School of Software Wuhan University Wuhan China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Department of Computing Hong Kong Polytechnic University Kowloon Hong Kong

In the Internet of Things, it is important to detect the various relations among objects for mining useful knowledge. Existing works on relation detection are based on centralized processing, which is not suitable for the Internet of Things owing to the unavailability of a server, one-point failure, computation bottleneck, and moving of objects. In this paper, we propose a distributed approach to detect relations among objects. We first build a system model for this problem that supports generic forms of relations and both physical time and logical time. Based on this, we design the distributed Relation Detection Approach (DRDA), which utilizes a distributed spanning tree to detect relations using in-network processing. DRDA can coordinate the distributed tree-building process of objects and automatically change the depth of the routing tree to a proper value. Optimization among multiple relation detection tasks is also considered. Extensive simulations were performed and the results show that the proposed approach outperforms existing approaches in terms of the energy consumption. © 2017 Weiping Zhu et al.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

An Efficient Label Routing on High-Radix Interconnection Networks

An Efficient Label Routing on High-Radix Interconnection Net...

引用

International Conference on parallel and distributed Systems (ICPADS)

作者： Fei Lei Dezun Dong Xiangke Liao College of Computer National University of Defense Technology China National Laboratory for Parallel and Distributed Processing National University of Defense Technology China Collaborative Innovation Center of High Performance Computing National University of Defense Technology China

ISBN: (纸本)9781538632086

Cost-effective adaptive routing has a significant impact on overall performance for high-radix hierarchical topologies, such as Dragonfly, which achieve a lower network diameter than traditional topologies, Torus and Fat tree, but exhibit a lower degree of adaptiveness for shortest-path rout- ing. Existing adaptive routing methods for those hierarchical topologies improve the adaptiveness by increasing path length, i.e. local or global adaptive routing, and thus suffer from complex and costly deadlock avoidance. This work aims to maximize the routing adaptiveness at the minimum cost of deadlock avoidance. We propose a label routing method for high-radix hierarchical networks. This label routing utilizes a co-design methodology and coordinates the two pipelines, input queue and routing computation, in the router microarchitec- ture. Packets in the input buffer are labeled by our routing algorithm depending on network states. We reorganize the input buffer and develop a label routing algorithm, named Green-Red Routing, GRR. GRR relaxes the requirement of using virtual channels to eliminate routing deadlock, and mitigates buffer resources dedicated to deadlock avoidance. GRR manages the buffer resources and balance its utilization elaborately, and achieve fully adaptive routing efficiently. We conduct extensive experiments to evaluate the performance of GRR on Dragonfly and compare it with state-of-the-art works. The results show that GRR achieves 10%-35% higher performance than existing routing algorithms under most traffic patterns.

关键词： Routing System recovery Network topology Topology Resource management Adaptive systems Organizations

来源：评论

学校读者我要写书评

暂无评论

Galaxyfly: A novel family of flexible-radix low-diameter topologies for large-scales interconnection networks 16

Galaxyfly: A novel family of flexible-radix low-diameter top...

引用

30th International Conference on Supercomputing, ICS 2016

作者： Lei, Fei Dong, Dezun Liao, Xiangke Su, Xing Li, Cunlu National Laboratory for Parallel and Distributed Processing Collaborative Innovation Center of High Performance Computing College of Computer National University of Defense Technology Changsha410073 China

ISBN: (纸本)9781450343619

Interconnection network plays an essential role in the architecture of large-scale high performance computing (HPC) systems. In the paper, we construct a novel family of lowdiameter topologies, Galaxyfly, using techniques of algebraic graphs over finite fields. Galaxyfly is guaranteed to retain a small constant diameter while achieving a flexible tradeoff between network scale and bisection bandwidth. Galaxyfly lowers the demands for high radix of network routers and is able to utilize routers with merely moderate radix to build exascale interconnection networks. We present effective congestion-aware routing algorithms for Galaxyfly by exploring its algebraic property. We conduct extensive simulations and analysis to evaluate the performance, cost and power consumption of Galaxyfly against state-of-the-art topologies. The results show that our design achieves better performance than most existing topologies under various routing algorithms and traffic patterns, and is cost-effective to deploy for exascale HPC systems. © 2016 ACM.

关键词： Topology

来源：评论

学校读者我要写书评

暂无评论

HPDedup: A hybrid prioritized data deduplication mechanism for primary storage in the cloud

arXiv

引用

arXiv 2017年

作者： Wu, Huijun Wang, Chen Fu, Yinjin Sakr, Sherif Zhu, Liming Lu, Kai Data61 CSIRO University of New South Wales Australia PLA University of Science and Technology China Science and Technology on Parallel and Distributed Laboratory State Key Laboratory of High Performance Computing State Key Laboratory of High-end Server & Storage Technology College of Computer National University of Defense Technology Changsha China

—Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use post-processing deduplication running in system idle time to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services or applications for the following two reasons: Firstly, the temporal locality of duplicate data writes may not exist in some primary storage workloads thus inline caching often fails to achieve good deduplication ratio. Secondly, the post-processing deduplication allows duplicate data to be written into disks, therefore does not provide the benefit of I/O deduplication and requires high peak storage capacity. This paper presents HPDedup, a Hybrid Prioritized data Deduplication mechanism to deal with the storage system shared by applications running in co-located virtual machines or containers by fusing an inline and a post-processing process for exact deduplication. In the inline deduplication phase, HPDedup gives a fingerprint caching mechanism that estimates the temporal locality of duplicates in data streams from different VMs or applications and prioritizes the cache allocation for these streams based on the estimation. HPDedup also allows different deduplication threshold for streams based on their spatial locality to reduce the disk fragmentation. The post-processing phase removes duplicates whose fingerprints are not able to be cached due to weak temporal locality from disks. The hybrid deduplication mechanism significantly reduces the amount of redundant data written to the storage system while maintaining inline data writing performance. Our experimental results show that HPDedup clearly outperforms the state-of-the-art primary storage deduplication techniques in terms of inline

关键词： Efficiency

来源：评论

学校读者我要写书评

暂无评论

Efficient and available in-memory KV-store with hybrid erasure coding and replication 16

Efficient and available in-memory KV-store with hybrid erasu...

引用

Proceedings of the 14th Usenix Conference on File and Storage Technologies

作者： Heng Zhang Mingkai Dong Haibo Chen Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University

ISBN: (纸本)9781931971287

In-memory key/value store (KV-store) is a key building block for many systems like databases and large websites. Two key requirements for such systems are efficiency and availability, which demand a KV-store to continuously handle millions of requests per second. A common approach to availability is using replication such as primary-backup (PBR), which, however, requires M + 1 times memory to tolerate M failures. This renders scarce memory unable to handle useful user *** paper makes the first case of building highly available in-memory KV-store by integrating erasure coding to achieve memory efficiency, while not notably degrading performance. A main challenge is that an in-memory KV-store has much scattered metadata. A single KV put may cause excessive coding operations and parity updates due to numerous small updates to metadata. Our approach, namely Cocytus, addresses this challenge by using a hybrid scheme that leverages PBR for small-sized and scattered data (e.g., metadata and key), while only applying erasure coding to relatively large data (e.g., value). To mitigate well-known issues like lengthy recovery of erasure coding, Cocytus uses an online recovery scheme by leveraging the replicated metadata information to continuously serving KV requests. We have applied Cocytus to Memcached. Evaluation using YCSB with different KV configurations shows that Cocytus incurs low overhead for latency and throughput, can tolerate node failures with fast online recovery, yet saves 33% to 46% memory compared to PBR when tolerating two failures.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Direct method-green's theory: From PDE to BIE in the geometric transformation

Direct method-green's theory: From PDE to BIE in the geometr...

引用

2016 International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2016

作者： Yang, Li-Na Li, Tao-Shen Tang, Yuan Yan Xu, Jia Pan, Jian-Jia Luo, Hui-Wu Zheng, Xian-Wei School of Computer Electronics and Information Guangxi University Nanning530004 China Department of Computer and Information Science Faculty of Science and Technology University of Macau China Guangxi Colleges Universities Key Laboratory of Parallel and Distributed Computing Nanning530004 China

ISBN: (纸本)9781509035885

In this research, we apply the Green's theory for converting the partial differential equation to the boundary integral equation for geometric transformation. Green's theory is designed specifically for integral equation. It is efficient in detecting the singularity point to the geometric transformation that has been verified. Experimental results show that the Green's theory has good performance. © 2016 IEEE.

关键词： Partial differential equations

来源：评论

学校读者我要写书评

暂无评论

SYNC or ASYNC: Time to fuse for distributed graph-parallel computation 2015

SYNC or ASYNC: Time to fuse for distributed graph-parallel c...

引用

20th ACM SIGPLAN Symposium on Principles and Practice of parallel Programming, PPoPP 2015

作者： Xie, Chenning Chen, Rong Guan, Haibing Zang, Binyu Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781450332057

Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. This paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes. Copyright 2015 ACM.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

Bipartite-Oriented distributed Graph Partitioning for Big Learning

引用

Journal of Computer Science & Technology 2015年第1期30卷 20-29页

作者：陈榕施佳鑫陈海波臧斌宇 Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University Shanghai 200240 China

Many machine learning and data mining （MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X （from 1.16X） for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.

关键词： bipartite graph graph partitioning graph-parallel system

来源：评论

学校读者我要写书评

暂无评论

An Approach for Modeling and Ranking Node-Level Stragglers in Cloud Datacenters

An Approach for Modeling and Ranking Node-Level Stragglers i...

引用

IEEE International Conference on Services computing (SCC)

作者： Xue Ouyang Peter Garraghan Changjian Wang Paul Townend Jie Xu Parallel and Distributed Laboratory National University of Defense Technology Changsha China School of Computing University of Leeds Leeds UK

The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance.

关键词： Servers Production Data models Computational modeling Analytical models Time factors Calculators

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：