检索结果-内蒙古大学图书馆

ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM

ACM SIGPLAN NOTICES 2014年第10期49卷 861-878页

作者： Vora, Keval Koduru, Sai Charan Gupta, Rajiv Univ Calif Riverside CSE Dept Riverside CA 92521 USA

Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring vertices. We observe that on distributed shared memory systems, by converting synchronous algorithms into their asynchronous counterparts, algorithms can be made tolerant to high inter-node communication latency. However, high inter-node communication latency can lead to excessive use of stale values causing an increase in the number of iterations required by the algorithms to converge. Although by using bounded staleness we can restrict the slow-down in the rate of convergence, this also restricts the ability to tolerate communication latency. In this paper we design a relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values. This is achieved via a coordinated use of best effort refresh policy and bounded staleness. We demonstrate that for a range of asynchronous graph algorithms and PDE solvers, on an average, our approach outperforms algorithms based upon: prior relaxed memory models that allow stale values by at least 2.27x;and Bulk Synchronous Parallel (BSP) model by 4.2x. We also show that our approach frequently outperforms GraphLab, a popular distributed graph processing framework.

关键词： distributed shared memory Communication Latency Bounded Staleness Best Effort Refresh PDE Solvers Graph Mining Graph Analytics

来源：评论

学校读者我要写书评

暂无评论

Performance improvement of parallel programs on a broadcast-based distributed shared memory multiprocessor by simulation

引用

SIMULATION MODELLING PRACTICE AND THEORY 2008年第3期16卷 338-352页

作者： Akay, Mehmet Fatih Katsinis, Constantine Cukurova Univ TR-01330 Adana Turkey Drexel Univ Coll Evening Studies Philadelphia PA 19104 USA

Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneous broadcasts are becoming feasible. distributed shared memory (DSM) implementations on such networks promise high performance even for small applications with small granularity. This paper, after summarizing the architecture of one such implementation called the Simultaneous Multiprocessor Optical Exchange Bus (SOME-Bus), presents simple algorithms for improving the performance of parallel programs running on the SOME-Bus multiprocessor implementing cache-coherent DSM. The algorithms are based on run-time data redistribution via dynamic page migration protocol. They use memory access references together with the information of average channel utilization, average channel waiting time, number of messages in the channel queue or short-term average channel waiting time reported by each node and gathered by hardware monitors to make correct decisions related to the placement of shared data. Simulations with four parallel codes on a 64-processor SOME-Bus show that the algorithms yield significant performance improvements such as reduction in the execution times, number of remote memory accesses, average channel waiting times, average network latencies and increase in average channel utilizations. (C) 2007 Elsevier B.V. All rights reserved.

关键词： distributed shared memory multiprocessors interconnection networks

来源：评论

学校读者我要写书评

暂无评论

Locally Exploitable Heterogeneous Multihop Communications Applied to Cooperative-robots

Locally Exploitable Heterogeneous Multihop Communications Ap...

引用

5th International Conference on Ubiquitous and Future Networks (ICUFN)

作者： Tanaka, Akira Tokyo Natl Coll Technol Dept Comp Sci Tokyo Japan

ISBN: (纸本)9781467359900

We have developed locally exploitable multihop communication systems that consist of easily-obtainable devices and enable user-oriented communications. By narrowing their purposes, difficult challenges for multihops are resolved. Taking advantage of pervasiveness, we study multihops to be locally embedded in WANs, and newly develop heterogeneously multihop-linked-cooperative-robots by integrating our multihop technologies: (a) smartphone multimedia-chatting, (b) infrared-communications with error-correction, switching, multipath-prevention and route-diversity, (c) energy-conserving-routing with mobility-prediction and procedure-integration in UHF systems. Uni-/multi-/broadcast, barrier-free and language-independent interface, visible-light-communication and heterogeneity are incorporated. Consequently, the cooperative-robots hold promise as search-and-rescue robots in disaster sites such as earthquakes as well as in daily life.

关键词： mobile multihop communications ubiquitous computing routing multicast heterogeneous communications positioning visible light communications automated robot route diversity distributed shared memory

来源：评论

学校读者我要写书评

暂无评论

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems

Regional Consistency: Programmability and Performance for No...

引用

12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Ramesh, Bharath Ribbens, Calvin J. Varadarajan, Srinidhi Virginia Tech Ctr High End Comp Syst Dept Comp Sci Blacksburg VA 24061 USA Dell Inc Round Rock TX USA

ISBN: (纸本)9780769550220

Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models. Furthermore, the rapid adoption of heterogeneous architectures, often with non-cache-coherent memory systems, has further increased the challenge of supporting shared memory programming models. Our primary objective is to define a memory consistency model that presents the familiar thread-based shared memory programming model, but allows good application performance on non-cache-coherent systems, including distributed memory clusters and accelerator-based systems. We propose regional consistency (RegC), a new consistency model that achieves this objective. Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.

关键词： memory consistency models cache coherence weak consistency models distributed shared memory

来源：评论

学校读者我要写书评

暂无评论

Using Traditional Data Analysis Algorithms to Detect Access Patterns for Big Data Processing

Using Traditional Data Analysis Algorithms to Detect Access ...

引用

15th IEEE International Conference on High Performance Computing and Communications (HPCC) /11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC)

作者： Zhao, Jiaqi Tao, Jie Wang, Lizhe Ranjan, Rajiv Kolodziej, Joanna Changchun Univ Technol Sch Basic Sci Changchun Jilin Peoples R China Karlsruhe Inst Technol Steinbuch Ctr Comp D-76021 Karlsruhe Germany Chinese Acad Sci Inst Remote Sensing & Digital Earth Beijing 100864 Peoples R China Commonwealth Sci & Ind Res Org ICT Ctr Clayton Vic Australia Cracow Univ Technol Inst Comp Sci Krakow Poland

ISBN: (纸本)9780769550886

The data sets produced in our daily life is getting larger and larger. How to manage and analyze such big data is currently a grand challenge for scientists in various research fields. MapReduce is regarded as an appropriate programming model for processing such big data. However, the users or developers still need to efficiently program appropriate data processing actions related to their analytics requirements. In other words analytics actions in MapReduce is not portable across different big data types. In this paper we propose to adopt traditional data clustering algorithms to automatically analyze large data sets. We applied this approach to process performance data on distributed shared memory machines for detecting the application access patterns. The advantage is that application developers need not write codes to understand the runtime access behavior of their applications. We optimized several benchmark applications based on the analysis results and the experiments show a considerable improvement in terms of execution time and speedup.

关键词： Data Analysis memory Performance Data Locality distributed shared memory Code Optimization

来源：评论

学校读者我要写书评

暂无评论

User-level Remote memory Paging for Multithreaded Applications 13

User-level Remote Memory Paging for Multithreaded Applicatio...

引用

13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid)

作者： Midorikawa, Hiroko Suzuki, Yuichiro Iwaida, Masatoshi Seikei Univ Grad Sch Sci & Technol Tokyo Japan JST CREST Tokyo Japan

The new page swap mechanism is introduced to resolve an inconsistent page problem for multithreaded applications in user-level remote paging systems. According to the evaluations, its overhead is limited and it can be... 详细信息

ISBN: (纸本)9780769549965;9781467364652

关键词： memory server distributed shared memory memory paging page swap remote memory virtual memory

来源：评论

学校读者我要写书评

暂无评论

Reliable Attributes Selection Technique for Predicting the Performance Measures of a DSM Multiprocessor Architecture

Reliable Attributes Selection Technique for Predicting the P...

引用

International Conference on Computer, Electrical and Electronics Engineering (ICCEEE)

作者： Zayid, Elrasheed Ismail Mohommoud Akay, Mehmet Fatih Univ Elimam Elmahdi Dept Comp Engn Kosti 11588 Sudan Cukurova Univ Dept Comp Engn Adana 01330 Turkey

ISBN: (纸本)9781467362313;9781467362320

In this study we develop a model for predicting the performance measures of a distributed shared memory (DSM) multiprocessor architecture by using a reliable attributes selection method. The structure of a DSM platform is interconnected by the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus), which is a low latency high bandwidth fiber-optic interconnection network. OPNET Modeler is used to simulate the SOME-Bus multiprocessor architecture and to create the datasets. The input variables chosen for the prediction model include the ratio service time over packet transfer time (varies from 0.01 to 1), traffic patterns (uniform, hot region, bit reverse and perfect shuffle), DSM protocol type, node number (varies to 16, 32 and 64), thread number (varies from 1 to 6). The attributes selection method examined the models using different machine learning tools. These tools include: multilayer feed forward artificial neural network (MFANNs), support vector regression with radial basis function (SVR-RBF) and multiple linear regression (MLR). Cross validation (CV) technique is applied using 10 folds. The results show that MFANN-based model gives the best results (i.e. SEE=11.1 and R = 0.998587 for CWT;SEE=18.96 and R = 0.997 for NRT;SEE=60.46 and R=0.8638 for IWT;SEE=0.04795 and R = 0.9838 for PU;SEE=0.0348 and R=0.9990 for CU). Results of the constructed new selected subset are compared with the original feature space and the findings prove the accuracy and reliability of the model.

关键词： distributed shared memory parallel multiprocessor architectures and artificial neural networks

来源：评论

学校读者我要写书评

暂无评论

shared-memory Synchronization 1st

引用

丛书名： Synthesis lectures on computer architecture,

2013年

作者： Michael L. Scott

ISBN: (数字)9781608459575

ISBN: (纸本)9781608459568

Since the advent of time sharing in the 1960s, designers of concurrent and parallel systems have needed to synchronize the activities of threads of control that share data structures in memory. In recent years, the study of synchronization has gained new urgency with the proliferation of multicore processors, on which even relatively simple user-level programs must frequently run in parallel. This lecture offers a comprehensive survey of shared-memory synchronization, with an emphasis on "systems-level" issues. It includes sufficient coverage of architectural details to understand correctness and performance on modern multicore machines, and sufficient coverage of higher-level issues to understand how synchronization is embedded in modern programming languages. The primary intended audience is "systems programmers"—the authors of operating systems, library packages, language run-time systems, concurrent data structures, and server and utility programs. Much of the discussion should also be of interest to application programmers who want to make good use of the synchronization mechanisms available to them, and to computer architects who want to understand the ramifications of their design decisions on systems-level code. Table of Contents: Introduction / Architectural Background / Essential Theory / Practical Spin Locks / Busy-wait Synchronization with Conditions / Read-mostly Atomicity / Synchronization and Scheduling / Nonblocking Algorithms / Transactional memory / Author's Biography

关键词： distributed shared memory memory management (Computer science)

来源：评论

学校读者我要写书评

暂无评论

Active memory controller

引用

JOURNAL OF SUPERCOMPUTING 2012年第1期62卷 510-549页

作者： Fang, Zhen Zhang, Lixin Carter, John B. McKee, Sally A. Ibrahim, Ali Parker, Michael A. Jiang, Xiaowei NVidia Corp Santa Clara CA USA Chinese Acad Sci Inst Comp Technol Beijing Peoples R China IBM Austin Res Lab Austin TX USA Chalmers Univ Technol S-41296 Gothenburg Sweden AMD Sunnyvale CA USA Intel Corp Intel Labs Santa Clara CA USA

Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs' performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50x faster barriers, 12x faster spinlocks, 8.5x-15x faster stream/array operations, and 3x faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation.

关键词： distributed shared memory Cache coherence memory architecture Interprocessor synchronization DRAM organization

来源：评论

学校读者我要写书评

暂无评论

Intrusion-Tolerant shared memory through a P2P Overlay Segmentation

Intrusion-Tolerant Shared Memory through a P2P Overlay Segme...

引用

26th IEEE International Conference on Advanced Information Networking and Applications (AINA)

作者： Boeger, Davi da Silva Fraga, Joni Alchieri, Eduardo Wangham, Michelle Univ Fed Santa Catarina Dept Automat & Syst Engn Florianopolis SC Brazil Univ Vale Itajai Embedded & Distributed Syst Grp Sao Jose Dos Campos Brazil

ISBN: (纸本)9780769546513

This paper describes our experience in developing an infrastructure which allows building intrusion-tolerant shared memory for large-scale systems. The infrastructure makes use of a P2P overlay and of the concept of State Machine Replication (SMR). Segmentation is introduced on the overlay key space to allow the use of algorithms for SMR. In this paper we describe the proposed infrastructure in its stratification and corresponding algorithms. An analysis about the algorithms and their costs is also presented.

关键词： instrusion-tolerance peer-to-peer networks distributed shared memory state machine replication

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：