The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators.
The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators.
In task-parallel programs, diverse activities can take place concurrently, and communication and synchronization patterns are complex and not easily predictable. Previous work has identified compositionality as an imp...
详细信息
In task-parallel programs, diverse activities can take place concurrently, and communication and synchronization patterns are complex and not easily predictable. Previous work has identified compositionality as an important design principle for task-parallel programs. In this article, we discuss alternative approaches to the realization of this principle, which holds that properties of program components should be preserved when those components are composed in parallel with other program components. We review two programming languages, Strand and Program Composition Notation, that support compositionality via a small number of simple concepts, namely, monotone operations on shared objects, a uniform addressing mechanism, and parallel composition. Both languages have been used extensively for large-scale application development, allowing us to provide an informed assessment of both their strengths and their weaknesses. We observe that while compositionality simplifies development of complex applications, the use of specialized languages hinders reuse of existing code and tools and the specification of domain decomposition strategies. This suggests an alternative approach based on small extensions to existing sequential languages. We conclude the article with a discussion of two languages that realize this strategy.
The use of programming patterns is considered to be a conceptual aid for programmers for developing understandable and testable concurrent and parallel code which is not only well built but also safe. By using program...
详细信息
The use of programming patterns is considered to be a conceptual aid for programmers for developing understandable and testable concurrent and parallel code which is not only well built but also safe. By using programming patterns and their implementations as computer programs, difficult new concepts can be smoothly taught in lectures to students who before trying this teaching approach would have been reluctant to enroll on parallel and Concurrent programming courses. The approach presented in this paper consists in changing the traditional programming teaching and learning model to one where students are first introduced to syntactical constructs through selected introductory program code-patterns. In the theory lessons that follow, through the use of laptops with multi-core processors and access to the Virtual Campus services of our university, the students are easily able to implement and master the new concepts as they are taught. This teaching experiment was implemented to teach a concurrent and real-time programming course which is part of the computer engineering (CE) degree and taught during the third semester of the CE curriculum. Evaluation of the students' academic performance when they had been taught with this approach revealed a 20.6% improvement in the students' end-of-course grades. (C) 2017 Elsevier Inc. All rights reserved.
Heterogeneous platforms composed of multiple different types of computing devices (such as CPUs, GPUs, and Intel MICs) have been widely used recently. However, most of parallel applications developed in such a heterog...
详细信息
Heterogeneous platforms composed of multiple different types of computing devices (such as CPUs, GPUs, and Intel MICs) have been widely used recently. However, most of parallel applications developed in such a heterogeneous platform usually only utilize a certain kind of computing device due to the lack of easy-to-use heterogeneous cooperative parallel programming models. To reduce the difficulty of heterogeneous cooperative parallel programming, a directive-based heterogeneous cooperative parallel programming framework called HeteroPP is proposed. HeteroPP provides an easier way for programmers to fully exploit multiple different types of computing devices to concurrently and cooperatively perform data-parallel applications on heterogeneous platforms. An extension to OpenMP directives and clauses is proposed to make it possible for programmers to easily offload a data-parallel compute kernel to multiple different types of computing devices. A source-to-source compiler is designed to help programmers to automatically generate multiple device-specific compute kernels that can be concurrently and cooperatively performed on heterogeneous platforms. Many experiments are conducted with 12 typical data-parallel applications implemented with HeteroPP on a heterogeneous CPU-GPU-MIC platform. The results show that HeteroPP not only greatly simplifies the heterogeneous cooperative parallel programming, but also can fully utilize the CPUs, GPU, and MIC to efficiently perform these applications.
Haskell is a modern, functional programming language with an interesting story to tell about parallelism: rather than using concurrent threads and locks, Haskell offers a variety of libraries that enable concise, high...
详细信息
Haskell is a modern, functional programming language with an interesting story to tell about parallelism: rather than using concurrent threads and locks, Haskell offers a variety of libraries that enable concise, high-level parallel programs with results that are guaranteed to be deterministic (independent of the number of cores and the scheduling being used).
New computational techniques for simulating a large array of wind turbines are highly needed to model modern electrical grid networks. In this paper, an implementation of a doubly fed induction generator wind turbine ...
详细信息
New computational techniques for simulating a large array of wind turbines are highly needed to model modern electrical grid networks. In this paper, an implementation of a doubly fed induction generator wind turbine model solver is proposed. This solver will run on an NVIDIA graphic processing unit, and it will be coded using the compute unified device architecture (CUDA). The implementation will integrate a linear time-invariant system represented by state-space matrices. It has been implemented a CUDA kernel capable of simulating many wind turbines in parallel with different wind profiles and using different configurations. Strategies such as optimizing memory access and overlapping data transfers with the kernel were used to obtain the results. The CUDA implementation reaches an occupancy of 95%, while simulating 500 wind turbines where each unit is subject to a different wind profile or using different configuration parameters.
Whilst there have been great advances in HPC hardware and software in recent years, the languages and models that we use to program these machines have remained much more static. This is not from a lack of effort, but...
详细信息
Whilst there have been great advances in HPC hardware and software in recent years, the languages and models that we use to program these machines have remained much more static. This is not from a lack of effort, but instead by virtue of the fact that the foundation that many programming languages are built on is not sufficient for the level of expressivity required for parallel work. The result is an implicit trade-off between programmability and performance which is made worse due to the fact that, whilst many scientific users are experts within their own fields, they are not HPC experts. Type oriented programming looks to address this by encoding the complexity of a language via the type system. Most of the language functionality is contained within a loosely coupled type library that can be flexibly used to control many aspects such as parallelism. Due to the high level nature of this approach there is much information available during compilation which can be used for optimisation and, in the absence of type information, the compiler can apply sensible default options thus supporting both the expert programmer and novice alike. We demonstrate that, at no performance or scalability penalty when running on up to 8196 cores of a Cray XE6 system, codes written in this type oriented manner provide improved programmability. The programmer is able to write simple, implicit parallel, HPC code at a high level and then explicitly tune by adding additional type information if required. (C) 2017 Elsevier Ltd. All rights reserved.
The efficient utilization of the available resources in modern parallel computing systems requires advanced parallel programming expertise. However, parallel programming is more difficult than sequential programming. ...
详细信息
The efficient utilization of the available resources in modern parallel computing systems requires advanced parallel programming expertise. However, parallel programming is more difficult than sequential programming. To alleviate the difficulties of parallel programming, high-level programming frameworks, such as OpenMP, have been proposed. Yet, there is evidence that novice parallel programmers make common mistakes that may lead to performance degradation or unexpected program behavior. In this paper, we present our cognitive parallel programming assistant (PAPA) that aims at educating and assisting novice parallel programmers to avoid common OpenMP mistakes. PAPA combines different IBM Watson services to provide a dialog-based interaction (through text and voice) for programmers. We use the Watson Conversation service to implement the dialog-based interaction, and the Speech-to-Text and Text-to-Speech services to enable the voice interaction. The Watson Natural Language Understanding and WordsAPl Synonyms services are used to train PAPA with OpenMP-related publications. We evaluate our approach using a user experience questionnaire with a number of novice parallel programmers at Linnaeus University. (C) 2018 Elsevier B.V. All rights reserved.
A system called Agora was designed and implemented that supports the development of multilanguage parallel applications for heterogeneous machines. Agora hinges on two ideas: the first one is that shared memory can be...
详细信息
A system called Agora was designed and implemented that supports the development of multilanguage parallel applications for heterogeneous machines. Agora hinges on two ideas: the first one is that shared memory can be a suitable abstraction to program concurrent, multilanguage modules running on heterogeneous machines. The second idea is that a shared memory abstraction can be efficiently supported across different computer architectures that are not connected by a physical shared memory, e.g., local area network workstations or ensemble machines. Agora has been in use for more than a year. The authors describe the Agora shared memory and its software implementation on both tightly and loosely coupled architectures. Measurements of the current implementation are also included.
Actor model-based design is actively researched for parallel embedded SW design since the model exposes the potential parallelism explicitly in an architecture-neutral form. In most actor-oriented models, actors are s...
详细信息
Actor model-based design is actively researched for parallel embedded SW design since the model exposes the potential parallelism explicitly in an architecture-neutral form. In most actor-oriented models, actors are self-contained and data channels are the only sharable object between actors, and they compose a system in a flat layer. In contrast, it is common to use shared library functions and construct vertically layered software for efficiency and modularity. To fill this gap between modeling and implementation, we propose a special actor, library task, with new types of ports: library master port and library slave port. It is a sharable and mappable object that defines a set of function interfaces inside. N:1 master-slave connection allows sharing a library task and the master-slave connection can specify vertically layered software and client-server applications naturally. To support the library task in our embedded software design environment, we develop an automatic mapping algorithm as well as an automatic code generator. The design environment with the library task is applied for two target platforms: IBM CELL Broadband Engine and an ARM-based multicore simulator. Preliminary experiments show that the special actor, or library task, extends the expression power of the previous actor model with efficiently generated codes.
暂无评论