The complexity of heterogeneous computing architectures, as well as the demand for productive and portable parallel application development, have driven the evolution of parallel programming models to become more comp...
详细信息
Resource Leveling Problem (RLP) is solved by heuristic, meta-heuristic, and mathematical methods. However, the aforementioned methods cannot guarantee the exact solution for large size problems. In this study, number ...
详细信息
Resource Leveling Problem (RLP) is solved by heuristic, meta-heuristic, and mathematical methods. However, the aforementioned methods cannot guarantee the exact solution for large size problems. In this study, number of feasible schedules which can be obtained by delaying the non-critical activities without violating the precedence relationships and elongating the project completion time are computed. All of the feasible schedules which can be defined as the search domain are enumerated and the guaranteed optimum solution for the RLP is obtained by a different method from the existing methods. Exponential equation between the search domain and the number of activities on serial path is derived and the insolvability of large RLP in a reasonable time by one central processing unit is verified. Partitioning of the problem into equal sizes is provided by parallel programming so that each particle contains the same number of enumeration. In this study, four RLP in which the largest problem has 36 activities are solved by exhaustive enumeration within reasonable solution time and it is proved that the proposed method is applicable. Exact solutions of larger problems can also be obtained by the proposed method if the problem is partitioned into smaller sizes.
parallel patterns, views, and spaces are promising abstractions to capture the programmer's intent as well as the contextual information that can be used by an underlying runtime to efficiently map software to par...
详细信息
ISBN:
(纸本)9780738143057
parallel patterns, views, and spaces are promising abstractions to capture the programmer's intent as well as the contextual information that can be used by an underlying runtime to efficiently map software to parallel hardware. These abstractions can be valuable in cases where an algorithm must accommodate requirements of code and performance portability across hardware architectures and vendor programming models. Kokkos is a parallel programming model for host- and accelerator architectures that relies on these abstractions and targets these requirements. It consists of a pure C++ interface, a specification, and a programming library. The programming library exposes patterns and types and maps them to an underlying abstract machine model. The abstract machine model offers a generic view of parallel hardware. While Kokkos is gaining popularity in large-scale HPC applications at some DOE laboratories, we believe that the implemented concepts are of interest to a broader audience including academia as they may contribute to a generic, vendor, and architecture-independent education of parallel programming. In this work, we give an insight into the design considerations of this programming model and list important abstractions. Further, we document best practices obtained from giving virtual classes on Kokkos and give pointers to resources that the reader may consider valuable for a lecture on generic parallel programming for students with preexisting knowledge on this matter.
Single-board computers have recently grown to offer developers a wide range of options where the common denominators are low power and low cost. In this paper, we present an embedded cluster platform for a remote para...
详细信息
ISBN:
(纸本)9781728109305
Single-board computers have recently grown to offer developers a wide range of options where the common denominators are low power and low cost. In this paper, we present an embedded cluster platform for a remote parallel programming lab to be used in an online course. A remote lab server handles all requests coming from the front-end running on an online learning platform and controls the execution of the parallel programming assignments submitted by students. The embedded cluster where the jobs run is made out of single-board computers connected through a gigabit network among them and to the lab server. In our first working prototype, we have tested six different state-of-the-art single-board computers, evaluating their processing latency, price, and tools compatibility. We found that the Vim3Pro performed best overall, being the fastest in most tests, having a mid-range price, and being only two times slower than a much more expensive high-end Xeon processor when using the same amount of cores.
Due to the ever-increasing computational demand of automotive applications, and in particular autonomous driving functionalities, the automotive industry and supply vendors are starling to adopt parallel and heterogen...
详细信息
ISBN:
(纸本)9781728165820
Due to the ever-increasing computational demand of automotive applications, and in particular autonomous driving functionalities, the automotive industry and supply vendors are starling to adopt parallel and heterogeneous embedded platforms for their products. However, C and C++, the currently dominating programming languages in this industry, do not provide sufficient mechanisms to target such platforms. Established parallel programming models such as OpenMP and OpenCI, on the other hand are tailored towards HPC systems. In this case study, we investigate the applicability of established parallel programming models to automotive workloads on heterogeneous platforms. We pursue a practical approach by re-enacting a typical development process for typical embedded platforms and representative benchmarks.
This paper presents EASYPAP, an easy-to-use programming environment designed to help students to learn parallel programming. EASYPAP features a wide range of 2D computation kernels that the students are invited to par...
详细信息
ISBN:
(纸本)9781728174457
This paper presents EASYPAP, an easy-to-use programming environment designed to help students to learn parallel programming. EASYPAP features a wide range of 2D computation kernels that the students are invited to parallelize using Pthreads, OpenMP, OpenCL or MPI. Execution of kernels can be interactively visualized, and powerful monitoring tools allow students to observe both the scheduling of computations and the assignment of 2D tiles to threads/processes. By focusing on algorithms and data distribution, students can experiment with diverse code variants and tune multiple parameters, resulting in richer problem exploration and faster progress towards efficient solutions. We present selected lab assignments which illustrate how EASYPAP improves the way students explore parallel programming.
The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has ...
详细信息
The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programmingparallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription
SyDPaCC is a set of libraries for the Coq interactive theorem prover. It allows to develop correct functional parallel programs on distributed lists based on the transformation of naive sequential programs that are co...
详细信息
ISBN:
(纸本)9781450359337
SyDPaCC is a set of libraries for the Coq interactive theorem prover. It allows to develop correct functional parallel programs on distributed lists based on the transformation of naive sequential programs that are considered as specifications. To offer the parallelization of functions on other data structures, the first step is to implement a parallel version of the considered data structure and to provide parallel implementations of primitive functions manipulating it. This paper presents such a first step: a binary tree extension which includes new map and reduce pure functional algorithmic skeletons for binary trees. Such algorithmic skeletons are templates of parallel algorithms, realized in a functional context as higherorder functions implemented in parallel. The use of these new primitives is illustrated on example applications.
The course of parallel programming is becoming more and more important for the education of students majoring in computer science. However, it is not easy to learn parallel programming well due to its high theory and ...
详细信息
It is proposed to add a static system of types to the dataflow functional model of parallel computing and the dataflow functional parallel programming language developed on its basis. The use of static typing increase...
详细信息
暂无评论