parallel programming and distributed programming involve substantial amounts of boilerplate code for process management and data synchronisation. This leads to increased bug potential and often results in unintended n...
详细信息
parallel programming and distributed programming involve substantial amounts of boilerplate code for process management and data synchronisation. This leads to increased bug potential and often results in unintended non-deterministic program behaviour. Moreover, algorithmic details are mixed with technical details concerning parallelisation and distribution. Process calculi are formal models for parallel and distributed programming but often leave details open, causing a gap between formal model and implementation. We propose a fully deterministic process calculus for parallel and distributed programming and implement it as a domain-specific language in Haskell to address these problems. We eliminate boilerplate code by abstracting from the exact notion of parallelisation and encapsulating it in the implementation of our process combinators. Furthermore, we achieve correctness guarantees regarding process composition at compile time through Haskell's type system. Our result can be used as a high-level tool to implement parallel and distributed programs.
Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions - a hierarchy of thread blocks, shared memory, and ba...
详细信息
Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions - a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a hybrid parallel programming approach using hybrid CUDA and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.
We provide a detailed evaluation of several parallel programming models, emphasizing both performance and energy efficiency in heterogeneous computing systems. The evaluation employs a diverse array of hardware, inclu...
详细信息
ISBN:
(数字)9798350356038
ISBN:
(纸本)9798350356045
We provide a detailed evaluation of several parallel programming models, emphasizing both performance and energy efficiency in heterogeneous computing systems. The evaluation employs a diverse array of hardware, including Intel Xeon and AMD Epyc CPUs, along with NVIDIA GPUs featuring Pascal, Turing, and Ampere architectures, and an AMD GPU with Vega10 architecture. We utilize SYCL, OpenMP, CUDA, and HIP for implementing benchmarks in 11 varied application domains, offering a comprehensive perspective on the capabilities of these programming models in diverse computing environments.
Shape theory is a new approach to data types and programming based on the separation of a data type into its "shape" and "data" parts. Shape is common in parallel computing. This paper identifies a...
详细信息
Shape theory is a new approach to data types and programming based on the separation of a data type into its "shape" and "data" parts. Shape is common in parallel computing. This paper identifies areas where the explicit use of shape reduces the burden of programming a parallel computer, using examples from an implementation of Cholesky decomposition.
Many advanced real-time robot control systems use multiprocessor parallelism to provide the necessary computing power and low response time to external events. Multiprocessor parallelism requires the decomposition of ...
详细信息
Many advanced real-time robot control systems use multiprocessor parallelism to provide the necessary computing power and low response time to external events. Multiprocessor parallelism requires the decomposition of the control software in parallel processes. A natural and efficient way to parallelize the control software is pipelining: data are transformed by the different stages of the pipeline, starting from the high-level user specification to the low-level control signals. The pipeline reflects the hierarchical structure of the software, and at the same time, allows the use of true hardware parallelism. This approach works well on applications with a single direction of information flow. However, in systems with feedback loops, the pipeline delay causes correction signals to be computed on stale data. To solve this problem one could omit the buffers and use machine language, but then all advantages of concurrent high-level languages are lost. The solution proposed in the paper preserves the advantages of parallel asynchronous processes written in high-level language. It is based on a decomposition of the global control strategy into a nested control structure, akin to human reflexes. The inner structure generates a fast autonomous, but approximate response on external stimuli, while the outer structure is responsible for a slower but more accurate behavior. The linearized entities of the inner loop can be updated in parallel at a much lower rate than the rate at which they are used.< >
The paper reports the design of a runtime library for data-parallel programming on clusters of symmetric multiprocessors (SMP clusters). Our design algorithms exploit a hybrid methodology which maps directly to the un...
详细信息
ISBN:
(纸本)0818686030
The paper reports the design of a runtime library for data-parallel programming on clusters of symmetric multiprocessors (SMP clusters). Our design algorithms exploit a hybrid methodology which maps directly to the underlying hierarchical memory system in SMP clusters, by combining two styles of programming methodologies-threads (shared memory programming) within a SMP node and message passing between SMP nodes. This hybrid approach has been used in the implementation of a library for collective communications. The prototype library is implemented based on standard interfaces for threads (pthread) and message passing (MPI). Experimental results on a cluster of Sun UltraSparc-II workstations are reported.
A software environment called Parade (parallel And Distributed Environment) for parallel programming is proposed. Its main objective is to make the programming as easy as possible. The development, debugging, executio...
详细信息
A software environment called Parade (parallel And Distributed Environment) for parallel programming is proposed. Its main objective is to make the programming as easy as possible. The development, debugging, execution, monitoring and optimization of parallel programs have been addressed. Several aspects of parallel program execution, such as task assignment to the processors, task synchronization and communication, are addressed by the system automatically. Two types of parallel system architectures have been considered: tightly coupled (i.e. multiprocessor systems) and loosely coupled with distributed memory (i.e. multicomputer systems). A user-friendly visual interface, realized with OSF/Motif, is provided for all phases of parallel program development. A prototype of our environment is running on a Meiko Transputer-based system and on a network of Unix-based workstations.< >
In the paper, a functional parallel programming system for clusters and multicore computers is discussed. It includes a language of parallel programming, program development tools, and tools for controlling parallel e...
详细信息
In the paper, a functional parallel programming system for clusters and multicore computers is discussed. It includes a language of parallel programming, program development tools, and tools for controlling parallel execution on the computer system. Central part of the system is original parallel compositional functional programming language FPTL (Functional parallel Typified Language).
暂无评论