检索结果-内蒙古大学图书馆

25th ACM SIGPLAN Symposium on Principles and Practice of Parallel programming (PPoPP)

作者： Hamidouche, Khaled LeBeane, Michael Adv Micro Devices Inc Santa Clara CA 95054 USA

ISBN: (纸本)9781450368186

Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. To address these concerns, recent works have explored performing network operations from within a GPU kernel itself. However, these approaches typically involve the CPU in the critical path, which leads to high latency and ineicient utilization of network and/or GPU resources. In this work, we introduce GPU Initiated OpenSHMEM (GIO), a new intra-kernel PGAS programming model and runtime that enables GPUs to communicate directly with a NIC without the intervention of the CPU. We accomplish this by exploring the GPU's coarse-grained memory model and correcting semantic mismatches when GPUs wish to directly interact with the network. GIO also reduces latency by relying on a novel template-based design to minimize the overhead of initiating a network operation. We illustrate that for structured applications like a Jacobi 2D stencil, GIO can improve application performance by up to 40% compared to traditional kernel-boundary networking. Furthermore, we demonstrate that on irregular applications like Sparse Triangular Solve (SpTS), GIO provides up to 44% improvement compared to existing intra-kernel networking schemes.

关键词： GPUs distributed programming models RDMA networks

来源：评论

学校读者我要写书评

暂无评论

How to deploy AI software to self driving cars 19

How to deploy AI software to self driving cars

引用

8th International Workshop on OpenCL (IWOCL)

作者： Brown, Gordon Ravindran, Meenakshi Burns, Rod Miller, Nicolas

ISBN: (纸本)9781450362306

The automotive industry is embracing new challenges to deliver self-driving cars, and this in turn requires increasingly complex hardware and software. Software developers are leveraging artificial intelligence, and in particular machine learning, to deliver the capabilities required for an autonomous vehicle to operate. This has driven automotive systems to become increasingly heterogeneous offering multi-core processors and custom co-processors capable of performing the intense algorithms required for artificial intelligence and machine learning. These new processors can be used to vastly speed up common operations used in AI (Artificial Intelligence) and machine learning *** R-Car V3H system-on-chip (SoC) from the Renesas AutonomyâĎć platform for ADAS (Advanced Driver Assistance Systems) and automated driving supports Level 3 and above (as defined by SAE's automation level definitions). It follows the heterogeneous IP concept of the Renesas Autonomy platformâĎć, giving the developer the choice of high performance computer vision at low power consumption, as well as flexibility to implement the latest algorithms such as those used in machine *** examining the architecture of the R-Car hardware we can understand how this differs from HPC and desktop heterogeneous systems, and how this can be mapped to the SYCL and OpenCL programming models. When both power consumption and performance are important, as is the case in the automotive industry, the focus for implementing OpenCL and SYCL on these hardware platforms must be a balanced approach. The memory capacity and layout must be used in the most optimum way to build a pipeline that provides the best throughput. The R-Car hardware provides DMA and on-chip memory where these are used to facilitate efficient data transfer on the device. The memory hierarchy layers can be seen on how it is efficiently mapped to OpenCL *** R-Car hardware also offers many fixed function IP blocks, each performin

关键词： C plus OpenCL SYCL parallelism concurrency heterogeneous programming distributed programming models

来源：评论

学校读者我要写书评

暂无评论

A survey on the distributed Computing stack

引用

COMPUTER SCIENCE REVIEW 2021年 42卷

作者： Ramon-Cortes, Cristian Alvarez, Pol Lordan, Francesc Alvarez, Javier Ejarque, Jorge Badia, Rosa M. Barcelona Supercomp Ctr BSC Barcelona Spain

In this paper, we review the background and the state of the art of the distributed Computing software stack. We aim to provide the readers with a comprehensive overview of this area by supplying a detailed big-picture of the latest technologies. First, we introduce the general background of distributed Computing and propose a layered top-bottom classification of the latest available software. Next, we focus on each abstraction layer, i.e. Application Development (including Task-based Workflows, Dataflows, and Graph Processing), Platform (including Data Sharing and Resource Management), Communication (including Remote Invocation, Message Passing, and Message Queuing), and Infrastructure (including Batch and Interactive systems). For each layer, we give a general background, discuss its technical challenges, review the latest programming languages, programming models, frameworks, libraries, and tools, and provide a summary table comparing the features of each alternative. Finally, we conclude this survey with a discussion of open problems and future directions. (C) 2021 Elsevier Inc. All rights reserved.

关键词： distributed systems distributed programming models distributed Computing Cloud computing Task-based Workflows Dataflows Graph Processing Streaming Data Sharing Resource Management Infrastructure managers

来源：评论

学校读者我要写书评

暂无评论

GPU initiated OpenSHMEM: correct and efficient intra-kernel networking for dGPUs 20

GPU initiated OpenSHMEM: correct and ef...

引用

Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel programming

作者： Khaled Hamidouche Michael LeBeane Advanced Micro Devices Inc.

ISBN: (纸本)9781450368186

Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. To address these concerns, recent works have explored performing network operations from within a GPU kernel itself. However, these approaches typically involve the CPU in the critical path, which leads to high latency and inefficient utilization of network and/or GPU *** this work, we introduce GPU Initiated OpenSHMEM (GIO), a new intra-kernel PGAS programming model and runtime that enables GPUs to communicate directly with a NIC without the intervention of the CPU. We accomplish this by exploring the GPU's coarse-grained memory model and correcting semantic mismatches when GPUs wish to directly interact with the network. GIO also reduces latency by relying on a novel template-based design to minimize the overhead of initiating a network operation. We illustrate that for structured applications like a Jacobi 2D stencil, GIO can improve application performance by up to 40% compared to traditional kernel-boundary networking. Furthermore, we demonstrate that on irregular applications like Sparse Triangular Solve (SpTS), GIO provides up to 44% improvement compared to existing intra-kernel networking schemes.

关键词： distributed programming models RDMA networks GPUs

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：