This paper presents GPotion, a DSL for GPU programming embedded in the Elixir functional language. GPotion allows programmers to write low-level GPU kernels, similar to CUDA kernels, in Elixir but also provides high-l...
详细信息
We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce-by-index (multi-reduce). We present...
详细信息
parallel programming remains a daunting challenge, from the struggle to express a parallel algorithm without cluttering the underlying synchronous logic, to describing which devices to employ in a calculation, to corr...
详细信息
A sorted set (or map) is one of the most used data types in computer science. In addition to standard set operations, like Insert, Remove, and Contains, it can provide set-set operations such as Union, Intersection, a...
详细信息
A popular approach to program scalable irregular applications is Asynchronous Many-Task (AMT) programming. Here, programs define tasks according to task models such as dynamic independent tasks (DIT) or nest...
详细信息
We investigated the performance impact of IEEE-754 double-precision floating-point subnormal numbers, focusing on vector arithmetic and transcendental functions across Intel, AMD, and HiSilicon CPUs. We developed a be...
详细信息
Performance analysis tools are frequently used to support the development of parallel MPI applications. They facilitate the detection of errors, bottlenecks, or inefficiencies but differ substantially in their instrum...
详细信息
Work stealing is a well-known technique for dynamic load balancing;however, manually writing work-stealing protocols is errorprone. We can use the Tascell parallel programming language for the correct and portable imp...
详细信息
Certain workloads such as in-memory databases are inherently hard to scale-out and rely on cache-coherent scale-up non-uniform memory access (NUMA) systems to keep up with the ever-increasing demand for compute resour...
详细信息
Manycore architectures integrate hundreds of cores on a single chip by using simple cores and simple memory systems usually based on software-managed scratchpad memories (SPMs). However, such architectures are notorio...
详细信息
暂无评论