The SystemC/TLM technologies are widely accepted in the industry for fast system-level simulation. An important limitation of SystemC regarding performance is that the reference implementation is sequential, and the o...
详细信息
ISBN:
(纸本)9783981537000
The SystemC/TLM technologies are widely accepted in the industry for fast system-level simulation. An important limitation of SystemC regarding performance is that the reference implementation is sequential, and the official semantics makes parallel executions difficult. As the number of cores in computers increase quickly, the ability to take advantage of the host parallelism during a simulation is becoming a major concern. Most existing work on parallelization of SystemC targets cycle-accurate simulation, and would be inefficient on loosely timed systems since they cannot run in parallel processes that do not execute simultaneously. We propose an approach that explicitly targets loosely timed systems, and offers the user a set of primitives to express tasks with duration, as opposed to the notion of time in SystemC which allows only instantaneous computations and time elapses without computation. Our tool exploits this notion of duration to run the simulation in parallel. It runs on top of any (unmodified) SystemC implementation, which lets legacy SystemC code continue running as-it-is. This allows the user to focus on the performance-critical parts of the program that need to be parallelized.
OCaml is a multi-paradigm (functional, imperative, object-oriented) high level sequential language. Types are statically inferred by the compiler and the type system is expressive and strong. These features make OCaml...
详细信息
ISBN:
(纸本)9781538678794
OCaml is a multi-paradigm (functional, imperative, object-oriented) high level sequential language. Types are statically inferred by the compiler and the type system is expressive and strong. These features make OCaml a very productive language for developing efficient and safe programs. In this tutorial we present three frameworks for using OCaml to program scalable parallel architectures: BSML, Multi-ML and Spoc.
A visit to the neighborhood PC retail store provides ample proof that we are in the multi-core era. The key differentiator among manufacturers today is the number of cores that they pack onto a single chip. The clock ...
详细信息
A visit to the neighborhood PC retail store provides ample proof that we are in the multi-core era. The key differentiator among manufacturers today is the number of cores that they pack onto a single chip. The clock frequency of commodity processors has reached its limit, however, and is likely to stay below 4 GHz for years to come. As a result, adding cores is not synonymous with increasing computational power. To take full advantage of the performance enhancements offered by the new multi-core hardware, a corresponding shift must take place in the software infrastructure - a shift to parallel computing.
This paper develops some ideas expounded in [1]. It distinguishes a number of ways of using parallelism, including disjoint processes, competition, cooperation, and communication. In each case an axiomatic proof rule ...
详细信息
The use of pure functional languages is often quoted as facilitating the production of parallel algorithms. In this paper we examine the claim that programs written in pure functional languages can be easily transform...
详细信息
ISBN:
(纸本)9783540551607
The use of pure functional languages is often quoted as facilitating the production of parallel algorithms. In this paper we examine the claim that programs written in pure functional languages can be easily transformed into programs which are suitable for implementation on a parallel machine. In particular, we investigate the transformation of a program to a form which is suitable for implementation on a pipelined process network, and then annotate the resulting network using a declarative language called Caliban. The case study which is used for this investigation is an abstract programming model which was designed by Banatre and Le Metayer, chosen as it is a non-trivial application which is not obviously well-suited to implementation in a functional language, and which is itself established as a model for concurrency. Program transformation is used to improve the efficiency of the implementation of the model for execution on a loosely-coupled multiprocessor. The parallel execution of the resulting algorithm is simulated in order to obtain estimates of performance measurements, and to gauge the effect of changing the granularity of the process network. Although we find that it is difficult to exploit pipeline parallelism effectively using the transformation techniques investigated here, we see that there is a direct correspondence between programs written in functional languages and distributed process networks, and that this can easily be harnessed by using Caliban. This correspondence, together with the referential transparency of programs written in pure functional languages, indicates that further research into the exploitation of parallelism using these languages is called for.
New sequencing technologies has been increasing the size of current genomes rapidly reducing its cost at the same time, those data need to be processed with efficient and innovated tools using high performance computi...
详细信息
ISBN:
(纸本)9783319665627;9783319665610
New sequencing technologies has been increasing the size of current genomes rapidly reducing its cost at the same time, those data need to be processed with efficient and innovated tools using high performance computing (HPC), but for taking advantage of nowadays supercomputers, parallel programming techniques and strategies have to be used. Plant genomes are full of Long Terminal Repeat Retrotransposons (LTR-RT), which are the most frequent repeated sequences;very important agronomical commodity such as Robusta Coffee and Maize have genomes that are composed by similar to 50% and similar to 85% respectively of this class of mobile elements, new parallel bioinformatics pipelines are making possible to use whole genomes like those in research projects, generating a lot of new information and impacting in many ways the knowledge that researchers have about them. Here we presented the utility of multi-core architectures and parallel programming for analyzing and classifying massive quantity of genomic information up to 16 times faster.
We show the relevance of a high level computational model for the development of correct parallel programs. To this end we derive four programs solving classical problems;some of which generally considered as '...
详细信息
ISBN:
(纸本)9783540551607
We show the relevance of a high level computational model for the development of correct parallel programs. To this end we derive four programs solving classical problems;some of which generally considered as ''inherently sequential''. The paper is only concerned with correctness ana does not address implementation issues.
We propose an unified parallel programming framework which supports both heterogeneity and fault tolerance in MPI programs on a variety of parallel computing platforms. This paper is mainly dedicated to heterogeneity ...
详细信息
ISBN:
(纸本)9781424468904
We propose an unified parallel programming framework which supports both heterogeneity and fault tolerance in MPI programs on a variety of parallel computing platforms. This paper is mainly dedicated to heterogeneity support in our framework. In our framework, a variety of parallel and sequential jobs submitted by multiple users are optimally scheduled on heterogeneous parallel computing environment. To balance the loads among the nodes on such heterogeneous computing environments, some of the parallel processes should be transferred between the nodes. We adopted the migration facility provided by Xen virtualization to realize a load balancing system where an MPI process running on a Xen virtual machine is migrated between the nodes. We confirmed that the protype system offers efficient load balancing facilities for heterogeneous computing environment with low overhead incurred by Xen virtualization.
Given the ubiquity of parallel computing hardware, we introduced parallelprogramming with pictures to the block-based Snap! environment and called it pSnap!, short for parallel Snap! We then created an accessible curr...
详细信息
ISBN:
(纸本)9798350311990
Given the ubiquity of parallel computing hardware, we introduced parallelprogramming with pictures to the block-based Snap! environment and called it pSnap!, short for parallel Snap! We then created an accessible curriculum for students of all ages to learn how to program serially and then how to program with explicit parallelism. This paper presents a new and innovative extension to our curriculum on parallel programming with pSnap!, one that broadens its appeal to the masses by teaching the application of parallel programming as a "choose your own learning adventure" activity, inspired by the Choose Your Own Adventure book series of the 1980s and 1990s. Specifically, after students learn the basics of parallel programming with pictures, they are ready to choose their next learning adventure, which applies their newfound parallel programming skills to create a video game of their choice, i.e., Missile Command or Do You Want to Build a Snowman?
GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware arc...
详细信息
ISBN:
(纸本)9781450336185
GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware architectures requires embedding logic into the programming language used to construct the application;the two primary forms of embedding are: calls to API routines to access the concurrent functionality, or pragmas providing concurrency hints to a language compiler such that particular blocks of code are targeted to the concurrent functionality. The former approach is verbose and semantically bankrupt, while the success of the latter approach is restricted to simple, static uses of the functionality. This paper presents an extension to an existing actor-based programming model and runtime to support executing applications on parallel hardware architectures. Besides the glove-like fit of a kernel to the actor abstraction, quantitative code analysis shows that actor-based kernels are always significantly simpler than API-based coding, and generally simpler than pragma-based coding. The structuring of applications in this manner, enables the runtime to automate the initialisation and interaction with these parallel hardware platforms. Performance measurements show that the overheads of actor-based kernels are commensurate to API based kernels, and range from equivalent to vastly improved for pragma-based annotations, both for sample and real world applications.
暂无评论