Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. To address these concerns, recent works have explored per...
详细信息
ISBN:
(纸本)9781450368186
Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary communication model that reduces performance and increases code complexity. To address these concerns, recent works have explored performing network operations from within a GPU kernel itself. However, these approaches typically involve the CPU in the critical path, which leads to high latency and inefficient utilization of network and/or GPU *** this work, we introduce GPU Initiated OpenSHMEM (GIO), a new intra-kernel PGAS programming model and runtime that enables GPUs to communicate directly with a NIC without the intervention of the CPU. We accomplish this by exploring the GPU's coarse-grained memory model and correcting semantic mismatches when GPUs wish to directly interact withthe network. GIO also reduces latency by relying on a novel template-based design to minimize the overhead of initiating a network operation. We illustrate that for structured applications like a Jacobi 2D stencil, GIO can improve application performance by up to 40% compared to traditional kernel-boundary networking. Furthermore, we demonstrate that on irregular applications like Sparse Triangular Solve (SpTS), GIO provides up to 44% improvement compared to existing intra-kernel networking schemes.
Assuming that the multicore revolution plays out the way the microprocessor industry expects, it seems that within a decade most programming will involve parallelism at some level. One needs to ask how this affects th...
详细信息
ISBN:
(纸本)9781605583976
Assuming that the multicore revolution plays out the way the microprocessor industry expects, it seems that within a decade most programming will involve parallelism at some level. One needs to ask how this affects the the way we teach computer science, or even how we have people think about computation. With regards to teaching there seem to be three basic choices: (1) we only train a small number of experts in parallel computation who develop a collection of libraries, and everyone else just uses them; (2) we leave our core curriculum pretty much as is, but add some advanced courses on parallelism or perhaps tack on a few lectures at the end of existing courses; or (3) we start teaching parallelism from the start and embed it throughout the curriculum withthe idea of getting students to think about parallelism as the most natural form of computation and sequential computation as a special *** talk will examine some of the implications of the third option. It will argue that thinking about parallelism, when treated in an appropriate way, might be as easy or easier that thinking sequentially. A key prerequisite, however, is to identify what the core ideas in parallelism are and how they might be layered and integrated with existing concepts. Another more difficult issue is how to cleanly integrate these ideas among courses. After all much of the success of sequential computation follows from the concept of a random access machine and its ability to serve as a simple, albeit imperfect, interface between programming languages, algorithm analysis, and hardware design. the talk will go through an initial list of some core ideas in parallelism, and an approach to integrating these ideas between parallel algorithms, programming languages, and, to some extent, hardware. this requires, however, moving away from the concept of a machine model as a interface for thinking about computation.
暂无评论