检索结果-内蒙古大学图书馆

您好，读者！请登录

咨询与建议

检索条件"机构=Shanghai Key Lab. of Scalable Computing and Systems Institute of Parallel and Distributed Systems"

共 17 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

NUMA-aware graph-structured analytics

引用

20th ACM SIGPLAN Symposium on Principles and Practice of parallel Programming, PPoPP 2015

作者： Zhang, Kaiyuan Chen, Rong Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China

ISBN: (纸本)9781450332057

Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. Though much prior research focuses on scaling graph-analytics on distributed environments, the strong desire on performance per core, dollar and joule has generated considerable interests of processing large-scale graphs on a single server-class machine, which may have several terabytes of RAM and 80 or more cores. However, prior graph-analytics systems are largely neutral to NUMA characteristics and thus have suboptimal performance. This paper presents a detailed study of NUMA characteristics and their impact on the efficiency of graph-analytics. Our study uncovers two insights: 1) either random or interleaved allocation of graph data will significantly hamper data locality and parallelism;2) sequential inter-node (i.e., remote) memory accesses have much higher bandwidth than both intra- and inter-node random ones. Based on them, this paper describes Polymer, a NUMA-aware graph-analytics system on multicore with two key design decisions. First, Polymer differentially allocates and places topology data, application-defined data and mutable runtime states of a graph system according to their access patterns to minimize remote accesses. Second, for some remaining random accesses, Polymer carefully converts random remote accesses into sequential remote accesses, by using lightweight replication of vertices across NUMA nodes. To improve load balance and vertex convergence, Polymer is further built with a hierarchical barrier to boost parallelism and locality, an edge-oriented balanced partitioning for skewed graphs, and adaptive data structures according to the proportion of active vertices. A detailed evaluation on an 80-core machine shows that Polymer often outperforms the state-of-the-art single-machine graph-analytics systems, including Ligra, X-Stream and Galois, for a set of popular real-world and synthetic grap

关键词： Random access storage

来源：评论

学校读者我要写书评

暂无评论

SYNC or ASYNC: Time to fuse for distributed graph-parallel computation 2015

SYNC or ASYNC: Time to fuse for distributed graph-parallel c...

引用

20th ACM SIGPLAN Symposium on Principles and Practice of parallel Programming, PPoPP 2015

作者： Xie, Chenning Chen, Rong Guan, Haibing Zang, Binyu Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781450332057

Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. This paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes. Copyright 2015 ACM.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

Tinman: Eliminating confidential mobile data exposure with security oriented offloading 15

Tinman: Eliminating confidential mobile data exposure with s...

引用

10th European Conference on Computer systems, EuroSys 2015

作者： Xia, Yubin Liu, Yutao Tan, Cheng Ma, Mingyang Guan, Haibing Zang, Binyu Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems China Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781450332385

The wide adoption of smart devices has stimulated a fast shift of security-critical data from desktop to mobile devices. However, recurrent device theft and loss expose mobile devices to various security threats and even physical attacks. This paper presents TinMan, a system that protects confidential data such as web site password and credit card number (we use the term cor to represent these data, which is short for Confidential Record) from being leaked or abused even under device theft. TinMan separates accesses of cor from the rest of the functionalities of an app, by introducing a trusted node to store cor and offloading any code from a mobile device to the trusted node to access cor. This completely eliminates the exposure of cor on the mobile devices. The key challenges to TinMan include deciding when and how to efficiently and transparently offload execution;Tin-Man addresses these challenges with security-oriented offloading with a low-overhead tainting scheme called asymmetric tainting to track accesses to cor to trigger offloading, as well as transparent SSL session injection and TCP payload replacement to offload accesses to cor. We have implemented a prototype of TinMan based on Android and demonstrated how TinMan protects the information of user's bank account and credit card number without modifying the apps. Evaluation results also show that TinMan incurs only a small amount of performance and power overhead. Copyright © 2015 ACM.

关键词： Crime

来源：评论

学校读者我要写书评

暂无评论

Computation and communication efficient graph processing with distributed immutable view 14

Computation and communication efficient graph processing wit...

引用

23rd ACM Symposium on High-Performance parallel and distributed computing, HPDC 2014

作者： Chen, Rong Ding, Xin Wang, Peng Chen, Haibo Zang, Binyu Guan, Haibing Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781450327480

Cyclops is a new vertex-oriented graph-parallel framework for writing distributed graph analytics. Unlike existing distributed graph computation models, Cyclops retains simplicity and computation-efficiency by synchronously computing over a distributed immutable view, which grants a vertex with read-only access to all its neighboring vertices. The view is provided via readonly replication of vertices for edges spanning machines during a graph cut. Cyclops follows a centralized computation model by assigning a master vertex to update and propagate the value to its replicas unidirectionally in each iteration, which can significantly reduce messages and avoid contention on replicas. Being aware of the pervasively availab.e multicore-based clusters, Cyclops is further extended with a hierarchical processing model, which aggregates messages and replicas in a single multicore machine and transparently decomposes each worker into multiple threads ondemand for different stages of computation. We have implemented Cyclops based on an open-source Pregel clone called Hama. Our evaluation using a set of graph algorithms on an in-house multicore cluster shows that Cyclops outperforms Hama from 2.06X to 8.69X and 5.95X to 23.04X using hash-based and Metis partition algorithms accordingly, due to the elimination of contention on messages and hierarchical optimization for the multicore-based clusters. Cyclops (written in Java) also has comparable performance with PowerGraph (written in C++) despite the language difference, due to the significantly lower number of messages and avoided contention. Copyright © 2014 ACM.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Bipartite-oriented distributed graph partitioning for big learning 14

Bipartite-oriented distributed graph partitioning for big le...

引用

5th ACM Asia-Pacific Workshop on systems, APSYS 2014

作者： Chen, Rong Shi, Jiaxin Zang, Binyu Guan, Haibing Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781450330244

Many machine learning and data mining (MLDM) problems like recommendation, topic modeling and medical diagnosis can be modeled as computing on bipartite graphs. However, most distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually causes excessive replication of vertices as well as significant pressure on network communication. This article identifies the challenges and opportunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.38X) for four typical MLDM algorithms, due to reducing up to 62% vertex replication, and up to 96% network traffic. © 2014 ACM.

关键词： Data mining

来源：评论

学校读者我要写书评

暂无评论

Concurrent and consistent virtual machine introspection with hardware transactional memory

Concurrent and consistent virtual machine introspection with...

引用

20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014

作者： Liu, Yutao Xia, Yubin Guan, Haibing Zang, Binyu Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science Shanghai Jiao Tong University China

ISBN: (纸本)9781479930975

Virtual machine introspection, which provides tamperresistant, high-fidelity 'out of the box' monitoring of virtual machines, has many prominent security applications including VM-based intrusion detection, malware analysis and memory forensic analysis. However, prior approaches are either intrusive in stopping the world to avoid race conditions between introspection tools and the guest VM, or providing no guarantee of getting a consistent state of the guest VM. Further, there is currently no effective means for timely examining the VM states in question. In this paper, we propose a novel approach, called TxIntro, which retrofits hardware transactional memory (HTM) for concurrent, timely and consistent introspection of guest VMs. Specifically, TxIntro leverages the strong atomicity of HTM to actively monitor updates to critical kernel data structures. Then TxIntro can mount introspection to timely detect malicious tampering. To avoid fetching inconsistent kernel states for introspection, TxIntro uses HTM to add related synchronization states into the read set of the monitoring core and thus can easily detect potential inflight concurrent kernel updates. We have implemented and evaluated TxIntro based on Xen VMM on a commodity Intel Haswell machine that provides restricted transactional memory (RTM) support. To demonstrate the effectiveness of TxIntro, we implemented a set of kernel rootkit detectors using TxIntro. Evaluation results show that TxIntro is effective in detecting these rootkits, and is efficient in adding negligible performance overhead. © 2014 IEEE.

关键词： Malware

来源：评论

学校读者我要写书评

暂无评论

Replication-Based Fault-Tolerance for Large-Scale Graph Processing

Replication-Based Fault-Tolerance for Large-Scale Graph Proc...

引用

International Conference on Dependable systems and Networks (DSN)

作者： Peng Wang Kaiyuan Zhang Rong Chen Haibo Chen Haibing Guan Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University Department of Computer Science Shanghai Jiao Tong University

The increasing algorithm complexity and dataset sizes necessitate the use of networked machines for many graph-parallel algorithms, which also makes fault tolerance a must due to the increasing scale of machines. Unfortunately, existing large-scale graph-parallel systems usually adopt a distributed checkpoint mechanism for fault tolerance, which incurs not only notable performance overhead but also lengthy recovery time. This paper observes that the vertex replicas created for distributed graph computation can be naturally extended for fast in-memory recovery of graph states. This paper proposes Imitator, a new fault tolerance mechanism, that supports cheaply maintenance of vertex states by replicating vertex states to their replicas during normal message exchanges, and provides fast in-memory reconstruction of failed vertices from replicas in other machines. Imitator has been implemented by extending Hama, a popular open-source clone of Pregel. Evaluation shows that Imitator incurs negligible performance overhead (less than 5% for all cases) and can recover from failures of more than one million of vertices with less than 3.4 seconds.

关键词： Fault tolerance Fault tolerant systems Computer crashes Checkpointing Synchronization Clustering algorithms Computational modeling

来源：评论

学校读者我要写书评

暂无评论

全选清除本页清除全部题录导出标记到“检索档案”

共2页 << < 1 2 > >>

回到顶部

执行限定条件

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：