检索结果-内蒙古大学图书馆

Towards implementation of a novel scheme for data prefetching on distributed shared memory systems

JOURNAL OF SUPERCOMPUTING 2009年第2期47卷 111-126页

作者： Wang, Hsiao-Hsi Li, Kuan-Ching Lu, Ssu-Hsuan Yang, Chun-Chieh Providence Univ Dept Comp Sci & Informat Engn Taichung 43301 Taiwan Providence Univ Dept Comp Sci & Informat Management Taichung 43301 Taiwan

High speed networks and rapidly improving microprocessor performance make the network of workstations an extremely important tool for parallel computing in order to speedup the execution of scientific applications. shared memory is an attractive programming model for designing parallel and distributed applications, where the programmer can focus on algorithmic development rather than data partition and communication. Based on this important characteristic, the design of systems to provide the shared memory abstraction on physically distributed memory machines has been developed, known as distributed shared memory (DSM). DSM is built using specific software to combine a number of computer hardware resources into one computing environment. Such an environment not only provides an easy way to execute parallel applications, but also combines available computational resources with the purpose of speeding up execution of these applications. DSM systems need to maintain data consistency in memory, which usually leads to communication overhead. Therefore, there exists a number of strategies that can be used to overcome this overhead issue and improve overall performance. Strategies as prefetching have been proven to show great performance in DSM systems, since they can reduce data access communication latencies from remote nodes. On the other hand, these strategies also transfer unnecessary prefetching pages to remote nodes. In this research paper, we focus on the access pattern during execution of a parallel application, and then analyze the data type and behavior of parallel applications. We propose an adaptive data classification scheme to improve prefetching strategy with the goal to improve overall performance. Adaptive data classification scheme classifies data according to the accessing sequence of pages, so that the home node uses past history access patterns of remote nodes to decide whether it needs to transfer related pages to remote nodes. From experimental resu

关键词： distributed shared memory Adaptive data classification scheme Effective prefetch strategy

来源：评论

学校读者我要写书评

暂无评论

Software Transactional distributed shared memory

Software Transactional Distributed Shared Memory

引用

14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

作者： Dash, Alokika Demsky, Brian Univ Calif Irvine Irvine CA 92717 USA

ISBN: (纸本)9781605583976

We have developed a transaction-based approach to distributed shared memory(DSM) that supports object caching and generates path expression prefetches. A path expression specifies a path through the heap that traverses the objects to be prefetched. To our knowledge, this is the first prefetching approach that can prefetch objects whose addresses have not been computed or predicted. Our DSM uses both prefetching and caching of remote objects to hide network latency while relying on the two-phase transaction commit mechanism to preserve the simple transactional consistency model that we present to the developer. We have evaluated this approach on a matrix multiply benchmark. We have found that our approach enables to effectively utilize multiple machines in a cluster and also benefit from prefetching and caching of objects.

关键词： Design Algorithms Transactional memory distributed shared memory Prefetching objects Path-expression prefetch

来源：评论

学校读者我要写书评

暂无评论

A distributed shared memory Architecture for Occasionally Connected Mobile Environments 1

引用

8th International Symposium on Advanced Parallel Processing Technologies

作者： Schneble, Christophe Seidmann, Thomas Huser, Hansjorg Institute for Networked Systems University of Applied Science Rapperswil Switzerland Cdot AG Altishofen (LU) Switzerland

ISBN: (数字)9783642036446

ISBN: (纸本)9783642036439

In this paper we. present a distributed cache architecture for occasionally connected systems. The system is realised using an underlying P2P-infrastructure. The gridNet Framework provides a transparent interface for working with distributed cache-objects. The paper also contains a description of an envisioned example application running on top of the GridNet framework.

关键词： distributed shared memory object cache peer to peer P2P healthcare occasionally connected systems mobile environments mobile grids

来源：评论

学校读者我要写书评

暂无评论

A Homogeneous Many-core x86 Processor Full System Framework Based on NoC

A Homogeneous Many-core x86 Processor Full System Framework ...

引用

International Conference on Computer Science and Network Technology

作者： Qinhong Zhang Meng Zhou Juan Chen Hao Yang The 28th Research Institute of China Electronics Technology Group Corporation Nanjing China Institute of Microelectronics Tsinghua University Beijing China

ISBN: (纸本)9781467381741

With the development of processor technique, more and more processor cores are integrated in a single chip to achieve higher performance requirements. Network-on-Chip (NoC) is the substitute for bus interconnection on many-core processor framework. In this paper, we propose a NoC based homogeneous many-core x86 processor architecture with distributed shared memory and high speed Network Interface (NI). By integrating the GEMS computer system simulator and the Booksim NoC simulator, we design a many-core x86 processor full system framework The experiment result of PARSEC benchmark shows that the framework is working efficiently with high performance.

关键词： many-core processor full system framework Network-on-Chip network-on-chip PROCESSOR NOCT gene quad-issue processor Multi-core processors Frameworks distributed shared memory

来源：评论

学校读者我要写书评

暂无评论

Quantifying Eventual Consistency with PBS

引用

COMMUNICATIONS OF THE ACM 2014年第8期57卷 93-102页

作者： Bailis, Peter Venkataraman, Shivaram Franklin, Michael J. Hellerstein, Joseph M. Stoica, Ion Univ Calif Berkeley Berkeley CA 94720 USA

Data replication results in a fundamental trade-off between operation latency and consistency. At the weak end of the spectrum of possible consistency models is eventual consistency, which provides no limit to the staleness of data returned. However, anecdotally, eventual consistency is often "good enough" for practitioners given its latency and availability benefits. In this work, we explain this phenomenon and demonstrate that, despite their weak guarantees, eventually consistent systems regularly return consistent data while providing lower latency than their strongly consistent counterparts. To quantify the behavior of eventually consistent stores, we introduce Probabilistically Bounded Staleness (PBS), a consistency model that provides expected bounds on data staleness with respect to both versions and wall clock time. We derive a closed-form solution for version-based staleness and model real-time staleness for a large class of quorum replicated, Dynamo-style stores. Using PBS, we measure the trade-off between latency and consistency for partial, non-overlapping quorum systems under Internet production workloads. We quantitatively demonstrate how and why eventually consistent systems frequently return consistent data within tens of milliseconds while offering large latency benefits.

关键词： RESEARCH distributed computing OPEN source software LATENT semantic analysis CONSISTENCY models (Computers) PROBABILISTIC generative models distributed shared memory PERFORMANCE

来源：评论

学校读者我要写书评

暂无评论

Software Transactional distributed shared memory

引用

ACM SIGPLAN NOTICES 2009年第4期44卷 297-298页

作者： Dash, Alokika Demsky, Brian Univ Calif Irvine Irvine CA 92717 USA

关键词： Design Algorithms Transactional memory distributed shared memory Prefetching objects Path-expression prefetch

来源：评论

学校读者我要写书评

暂无评论

ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM 14

ASPIRE: Exploiting <i>Asynchronous Parallelism</i> in <i>I</...

引用

2014 ACM International Conference on Object-Oriented-Programming-Systems-Languages-and-Applications (OOPSLA 14)

作者： Vora, Keval Koduru, Sai Charan Gupta, Rajiv Univ Calif Riverside CSE Dept Riverside CA 92521 USA

ISBN: (纸本)9781450325851

Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring vertices. We observe that on distributed shared memory systems, by converting synchronous algorithms into their asynchronous counterparts, algorithms can be made tolerant to high inter-node communication latency. However, high inter-node communication latency can lead to excessive use of stale values causing an increase in the number of iterations required by the algorithms to converge. Although by using bounded staleness we can restrict the slow-down in the rate of convergence, this also restricts the ability to tolerate communication latency. In this paper we design a relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values. This is achieved via a coordinated use of best effort refresh policy and bounded staleness. We demonstrate that for a range of asynchronous graph algorithms and PDE solvers, on an average, our approach outperforms algorithms based upon: prior relaxed memory models that allow stale values by at least 2.27x;and Bulk Synchronous Parallel (BSP) model by 4.2x. We also show that our approach frequently outperforms GraphLab, a popular distributed graph processing framework.

关键词： distributed shared memory Communication Latency Bounded Staleness Best Effort Refresh PDE Solvers Graph Mining Graph Analytics

来源：评论

学校读者我要写书评

暂无评论

A comparative evaluation of hybrid distributed shared-memory systems

引用

JOURNAL OF SYSTEMS ARCHITECTURE 2009年第1期55卷 43-52页

作者： Moga, Adrian Dubois, Michel Univ So Calif Dept Elect Engn Los Angeles CA 90089 USA Intel Corp Hillsboro OR 97124 USA

distributed shared-memory (DSM) systems are shared-memory multiprocessor architectures in which each processor node contains a partition of the shared memory. In hybrid DSM systems coherence among caches is maintained by a software-implemented coherence protocol relying on some hardware support. Hardware support is provided to satisfy every node hit (the common case) and software is invoked only for accesses to remote nodes. In this paper we compare the design and performance of four hybrid distributed shared memory (DSM) organizations by detailed simulation of the same hardware platform. We have implemented the software protocol handlers for the four architectures. The handlers are written in C and assembly code. Coherence transactions are executed in trap and interrupt handlers. Together with the application, the handlers are executed in full detail in execution-driven simulations of six complete benchmarks with coarse-grain and fine-grain sharing. We relate our experience implementing and simulating the software protocols for the four architectures. Because the overhead of remote accesses is very high in hybrid systems, the system of choice is different than for purely hardware systems. (c) 2008 Elsevier B.V. All rights reserved.

关键词： distributed shared memory Multiprocessor systems NUMA Software cache coherence Execution-driven simulation

来源：评论

学校读者我要写书评

暂无评论

distributed Programming Framework for Fast Iterative Optimization in Networked Cyber-Physical Systems

引用

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS 2014年第2-Sup期13卷 66-66页

作者： Balani, Rahul Wanner, Lucas F. Srivastava, Mani B. IBM Res Corp New Delhi India Univ Calif Los Angeles Dept Elect Engn Los Angeles CA 90024 USA Univ Calif Los Angeles Dept Comp Sci Los Angeles CA 90024 USA

Large-scale coordination and control problems in cyber-physical systems are often expressed within the networked optimization model. While significant advances have taken place in optimization techniques, their widespread adoption in practical implementations has been impeded by the complexity of internode coordination and lack of programming support for the same. Currently, application developers build their own elaborate coordination mechanisms for synchronized execution and coherent access to shared resources via distributed and concurrent controller processes. However, they typically tend to be error prone and inefficient due to tight constraints on application development time and cost. This is unacceptable in many CPS applications, as it can result in expensive and often irreversible side-effects in the environment due to inaccurate or delayed reaction of the control system. This article explores the design of a distributed shared memory (DSM) architecture that abstracts the details of internode coordination. It simplifies application design by transparently managing routing, messaging, and discovery of nodes for coherent access to shared resources. Our key contribution is the design of provably correct locality-sensitive synchronization mechanisms that exploit the spatial locality inherent in actuation to drive faster and scalable application execution through opportunistic data parallel operation. As a result, applications encoded in the proposed Hotline Application Programming Framework are error free, and in many scenarios, exhibit faster reactions to environmental events over conventional implementations. Relative to our prior work, this article extends Hotline with a new locality-sensitive coordination mechanism for improved reaction times and two tunable iteration control schemes for lower message costs. Our extensive evaluation demonstrates that realistic performance and cost of applications are highly sensitive to the prevalent deployment, network, an

关键词： Design Algorithms Performance Wireless sensor/actuator networks distributed optimization distributed shared memory synchronization subgradient methods

来源：评论

学校读者我要写书评

暂无评论

Eventually Consistent: Not What You Were Expecting?

引用

COMMUNICATIONS OF THE ACM 2014年第3期57卷 38-44页

作者： Golab, Wojciech Rahman, Muntasir R. AuYoung, Alvin Keeton, Kimberly Li, Xiaozhou (Steve) Univ Waterloo Waterloo ON N2L 3G1 Canada Hewlett Packard Labs Palo Alto CA USA Univ Illinois Distributed Protocols Res Grp Urbana IL USA Google Washington DC USA

Methods of quantifying consistency (or lack thereof) in eventually consistent storage systems.

关键词： distributed computing Client/server computing Computer network architectures Multiuser computer systems Peer-to-peer architecture (Computer networks) distributed shared memory distributed tracking

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：