检索结果-内蒙古大学图书馆

Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU

JOURNAL OF PARALLEL AND distributed computing 2008年第10期68卷 1339-1349页

作者： Stantchev, George Dorland, William Gumerov, Nail Univ Maryland Ctr Sci Comp & Math Modeling College Pk MD 20742 USA Univ Maryland Inst Adv Comp Studies College Pk MD 20742 USA

Particle-In-Cell (PIC) methods have been widely used for plasma physics simulations in the past three decades. To ensure an acceptable level of statistical accuracy relatively large numbers of particles are needed. State-of-the-art Graphics Processing Units (GPUs), with their high memory bandwidth. hundreds of SPMD processors, and half-a-teraflop performance potential, offer a viable alternative to distributed memory parallel computers for running medium-scale PICplasma simulations on inexpensive commodity hardware. In this paper, we present an overview of a typical plasma PIC code and discuss its GPU implementation. In particular we focus on fast algorithms for the performance bottleneck operation of Particle-To-Grid interpolation. (c) 2008 Elsevier Inc. All rights reserved.

关键词： GPU computing Scientific computing Parallel algorithms Numerical simulations Particle-In-cell methods Plasma physics

来源：评论

学校读者我要写书评

暂无评论

Robust emulations of shared memory in a crash-recovery model

Robust emulations of shared memory in a crash-recovery model

引用

24th International conference on distributed computing Systems

作者： Guerraoui, R Levy, RR Ecole Polytech Fed Lausanne Distributed Programming Lab CH-1015 Lausanne Switzerland

ISBN: (纸本)0769520863

A shared memory abstraction can be robustly emulated over an asynchronous message passing system where any process can fail by crashing and possibly recover (crash-recovery model), by having (a) the processes exchange messages to synchronize their read and write operations and (b) log key information on their local stable storage. This paper extends the existing atomicity consistency criterion defined for multi-writer/multi-reader shared memory in a crash-stop model, by providing two new criteria for the crash-recovery model. We introduce lower bounds on the log-complexity for each of the two corresponding types of robust shared memory emulations. We demonstrate that our lower bounds are tight by providing algorithms that match them. Besides being optimal, these algorithms have the same message and time complexity as their most efficient counterpart we know of in the crash-stop model.

关键词： Data storage equipment

来源：评论

学校读者我要写书评

暂无评论

Energy Efficient In-memory Integer Multiplication Based on Racetrack memory 40

Energy Efficient In-memory Integer Multiplication Based on R...

引用

40th IEEE International conference on distributed computing Systems (ICDCS)

作者： Luo, Tao Zhang, Wei He, Bingsheng Liu, Cheng Maskell, Douglas Agcy Sci Technol & Res Inst High Performance Comp Singapore Singapore Hong Kong Univ Sci & Technol Dept Elect & Comp Engn Hong Kong Peoples R China Natl Univ Singapore Sch Comp Singapore Singapore Chinese Acad Sci Inst Comp Technol Guangzhou Guangdong Peoples R China Nanyang Technol Univ Sch Comp Sci & Engn Singapore Singapore

ISBN: (纸本)9781728170022

Both computation- and memory-intensiveness of deep learning models have made the deployment of model inference on edge devices with limited resource and energy budget challenging. Non-Volatile memory (NVM) based in-memory computing has been proposed to reduce data movement as well as energy consumption, which could alleviate the challenge. Racetrack memory is a newly introduced memory technology. It allows high data density fabrication and thus is a good fit for in-memory computing. In order to facilitate the deployment of deep learning models on edge devices, we present an racetrack memory based in-memory integer multiplication, which is one of the core operations in compressed deep learning models. The presented multiplication can be constructed efficiently using racetrack memory technique, and perform the logical operations based on the memory cell with partial reuse of the peripheral circuits. In addition to the multiplication architecture, we also propose and apply a novel write optimization method to the integer multiplication, which transforms the required write operations to shift operations for performance and energy efficiency. The resulting design realizes high area and energy efficiency while maintaining comparable performance with its CMOS counterpart.

关键词： Booth multiplication racetrack memory shift-only write in-memory computing edge computing

来源：评论

学校读者我要写书评

暂无评论

A Self-Organizing distributed memory Cache for Data Sharing Applications in Cluster Environment

A Self-Organizing Distributed Memory Cache for Data Sharing ...

引用

15th IEEE International conference on High Performance computing and Communications (HPCC) /11th IEEE/IFIP International conference on Embedded and Ubiquitous computing (EUC)

作者： Liu, Weifeng Gong, Bin Hu, Yi Guo, Yiru Shandong Univ Sch Comp Sci & Technol Jinan 250100 Shandong Peoples R China

ISBN: (纸本)9780769550886

Although capacities of persistent storage devices evolved rapidly in the last years, the bandwidth between memory and persistent storage devices is still the bottleneck. As loosely coupled data sharing applications running in cluster environment may need an enormous number of files, the access to these files might become the bottleneck. With the quick development of the server and high-speed network, there are many works done on distributed memory cache to minimize data requests to the centralized filesystem. These systems have the drawback that nodes are coupled together to form a distributed cache statically. It is a difficult administrative task for changing environments like clusters. Current high performance computing resources, support batch job submissions using distributed resource management systems like TORQUE. How to use the resource management system to set up a self-organizing distributed memory cache on demand has rarely been studied. In this paper, we design a framework for dynamically setting up distributed memory cache for data sharing applications. Shared files are stored in the distributed memory cache, which can be accessed transparently and deliver data with high bandwidth. We describe the architecture of the framework, and evaluate its performance for a use case scenario.

关键词： distributed memory cache MPI parallel I/O

来源：评论

学校读者我要写书评

暂无评论

PROGRAMMING LANGUAGE CONCEPTS FOR MULTIPROCESSORS

引用

PARALLEL computing 1988年第1-3期8卷 31-40页

作者： JORDAN, HF Department of Computer Science University of Colorado Boulder CO 80309 U.S.A.

It is currently possible to build multiprocessor systems which will support the tightly coupled activity of hundreds to thousands of different instruction streams, or processes. This can be done by coupling many monoprocessors, or a smaller number of pipelined multiprocessors, through a high concurrency switching network. The switching network may couple processors to memory modules, resulting in a shared memory multiprocessor system, or it may couple processor/memory pairs, resulting in a distributed memory system. The need to direct the activity of very many processes simultaneously places qualitatively different demands on a programming language than the direction of a single process. In spite of the different requirements, most languages for multiprocessors have been simple extensions of conventional, single stream programming languages. The extensions are often implemented by way of subroutine calls and have little impact on the basic structure of the language. This paper attempts to examine the underlying conceptual structure of parallel languages for large-scale multiprocessors on the basis of an existing language for shared memory multiprocessors, known as the Force, and to extend the concepts in this language to distributed memory systems.

关键词： Multiprocessors shared memory distributed memory distributed variables global communication

来源：评论

学校读者我要写书评

暂无评论

Reservoir Echo State Network for Classification of Multivariate Time Series 30

Reservoir Echo State Network for Classification of Multivari...

引用

30th International conference on High Performance computing, Data, and Analytics (HiPC)

作者： Purkayastha, Basab Bijoy Barma, Shovan Indian Inst Technol Guwahati Dept Phys Gauhati India Indian Inst Informat Technol Guwahati Dept ECE Gauhati India

ISBN: (纸本)9798350383782;9798350383799

Multivariate time series (MTS) classification has been tackled using various methods, including Reservoir computing (RC), which generates efficient vectorized representations like reservoir state (RS). RS shines when handling extensive classes or training sets but demands longer processing and substantial memory. Addressing this, in this study we present the Parallel Reservoir Echo State Network (PR-ESN), an optimized parallel training and evaluation algorithm rooted in the ESN principle. It leverages both CPU-shared memory and parallel distributed memory architecture to efficiently capture reservoir state's optimal model space representation, addressing computational challenges in MTS analysis. Distinguishing itself from previous works, PR-ESN combines distributed parallel processing at the network level and shared memory multiprocessing at the node level. This results in reduced memory requirements and faster processing, making it a significant contribution to the field. Key features include PR-ESN's distributed training and evaluation, shared memory parallelization, and MSR concatenation for comprehensive analysis of distributed model space representations. Testing on real-world MTS and benchmark ECG data proves PR-ESN-based classifiers achieve superior accuracy and faster processing times with optimal memory usage. Testing on real-world MTS and benchmark ECG data proves PR-ESN-based classifiers achieve superior accuracy and faster processing times with optimal memory usage.

关键词： Shared memory Parallelism distributed memory Parallelism Echo State Network Reservoir computing Parallel computing Multivariate Time Series Classification

来源：评论

学校读者我要写书评

暂无评论

Gengar: An RDMA-based distributed Hybrid memory Pool 41

Gengar: An RDMA-based Distributed Hybrid Memory Pool

引用

41st IEEE International conference on distributed computing Systems (ICDCS)

作者： Duan, Zhuohui Liu, Haikun Lu, Haodi Liao, Xiaofei Jin, Hai Zhang, Yu He, Bingsheng Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Sch Comp Sci & Technol Serv Comp Technol & Syst LabCluster & Grid Comp Wuhan 430074 Peoples R China Natl Univ Singapore Sch Comp Singapore Singapore

ISBN: (纸本)9781665445139

Byte-addressable Non-volatile memory (NVM) technologies promise higher density and lower cost than DRAM. They have been increasingly employed for data center applications. Despite many previous studies on using NVM in a single machine, there remain challenges to best utilize it in a distributed data center environment. This paper presents Gengar, an RDMA-enabled distributed Shared Hybrid memory (DSHM) pool with simple programming APIs on viewing remote NVM and DRAM in a global memory space. We propose to exploit semantics of RDMA primitives to identify frequently-accessed data in the hybrid memory pool, and cache it in distributed DRAM buffers. We redesign RDMA communication protocols to reduce the bottleneck of RDMA write latency by leveraging a proxy mechanism. Gengar also supports memory sharing among multiple users with data consistency guarantee. We evaluate Gengar in a real testbed equipped with Intel Optane DC Persistent DIMMs. Experimental results show that Gengar significantly improves the performance of public benchmarks such as MapReduce and YCSB by up to 70% compared with state-of-the-art DSHM systems.

关键词： distributed Shared memory Non-volatile memory RDMA Persistent memory

来源：评论

学校读者我要写书评

暂无评论

distributed In-memory Cluster computing Approach in Scala for Solving Graph Data Applications

Distributed In-memory Cluster Computing Approach in Scala fo...

引用

Int conference Adv Elect Comp Commun ICAECC

作者： Johnpaul, C., I Thampi, Neetha Susan Amrita Sch Engn Dept Comp Sci & Engn Coimbatore Tamil Nadu India

ISBN: (纸本)9781479954964

Large graph analysis is one of the significant applications of distributed computing frameworks. The distributed computing applications are solved by developing programs over different types of established distributed computing frameworks. Since graph analysis and prediction is one of the new trend in data analytics, designing the problems on an in-memory cluster framework which consumes graph data-sets have a significant role in distributed computing. Traditional disk-based distributed computing framework like hadoop will confine only to a specific group of problems in data analytics. The importance of utilizing the memory of the cluster apart from the disk-based storage space contributes a significant role in reducing the latency and increasing the speedup. The whole work describes the significance of spark-framework in solving graph related problems in a distributed approach using page ranking algorithm and proteome-protein annotation method in Scala.

关键词： Apache Hadoop Hama Spark Pregel Network-flow Fault tolerance distributed computing Scala Cluster-computing

来源：评论

学校读者我要写书评

暂无评论

CONCORD - RETHINKING THE DIVISION-OF-LABOR IN A distributed SHARED-memory SYSTEM

CONCORD - RETHINKING THE DIVISION-OF-LABOR IN A DISTRIBUTED ...

引用

1994 Scalable High Performance computing conference (SHPCC 94)

作者： LEE, JW UNIV WASHINGTON DEPT COMP SCI & ENGNSEATTLEWA 98195

ISBN: (纸本)0818656808

A distributed shared memory system provides the abstraction of a shared address space on either a network of workstations or a distributed-memory multiprocessor. Although a distributed shared memory system can improve performance by relaxing the memory consistency model and maintaining memory coherence at a granularity specified by the programmer, the challenge is to offer ease of programming while maintaining high performance. Concord meets this challenge by carefully splitting responsibilities among the programmer, the compiler, and the runtime system. Concord has allowed a single programmer to port several real, large shared-memory parallel programs onto an Intel iPSC/2 in a few weeks and achieve reasonable speedup.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems

Regional Consistency: Programmability and Performance for No...

引用

12th IEEE International conference on Trust, Security and Privacy in computing and Communications (TrustCom)

作者： Ramesh, Bharath Ribbens, Calvin J. Varadarajan, Srinidhi Virginia Tech Ctr High End Comp Syst Dept Comp Sci Blacksburg VA 24061 USA Dell Inc Round Rock TX USA

ISBN: (纸本)9780769550220

Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models. Furthermore, the rapid adoption of heterogeneous architectures, often with non-cache-coherent memory systems, has further increased the challenge of supporting shared memory programming models. Our primary objective is to define a memory consistency model that presents the familiar thread-based shared memory programming model, but allows good application performance on non-cache-coherent systems, including distributed memory clusters and accelerator-based systems. We propose regional consistency (RegC), a new consistency model that achieves this objective. Results on up to 256 processors for representative benchmarks demonstrate the potential of RegC in the context of our prototype distributed shared memory system.

关键词： memory consistency models cache coherence weak consistency models distributed shared memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：