FastFlow is an open source, structured parallel programming framework originally conceived to support highly efficient stream parallel computation while targeting shared memory multi cores. Its efficiency mainly comes...
详细信息
Bounded-exhaustive testing (BET), which exercises a program under test for all inputs up to some bounds, is an effective method for detecting software bugs. Systematic property-based testing is a BET approach where de...
详细信息
Bounded-exhaustive testing (BET), which exercises a program under test for all inputs up to some bounds, is an effective method for detecting software bugs. Systematic property-based testing is a BET approach where developers write test generation programs that describe properties of test inputs. Hybrid test generation programs offer the most expressive way to write desired properties by freely combining declarative filters and imperative generators. However, exploring hybrid test generation programs, to obtain test inputs, is both computationally demanding and challenging to parallelize. We present the first programming and execution models, dubbed Tempo, for parallel exploration of hybrid test generation programs. We describe two different strategies for mapping the computation to parallel hardware and implement them both for GPUs and CPUs. We evaluated Tempo by generating instances of various data structures commonly used for benchmarking in the BET domain. Additionally, we generated CUDA programs to stress test CUDA compilers, finding four bugs confirmed by the developers.
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM c...
详细信息
This paper proposes a parallel programming scheme for the cross-point array with resistive random access memory (RRAM). Synaptic plasticity in unsupervised learning is realized by tuning the conductance of each RRAM cell. Inspired by the spike-timing-dependent-plasticity (STDP), the programming strength is encoded into the spike firing rate (i.e., pulse frequency) and the overlap time (i.e., duty cycle) of the pre-synaptic node and post-synaptic node, and simultaneously applied to all RRAM cells in the cross-point array. Such an approach achieves parallel programming of the entire RRAM array, only requiring local information from pre-synaptic and post-synaptic nodes to each RRAM cell. As demonstrated by digital peripheral circuits implemented in 65nm CMOS, the programming time of a 40kb RRAM array is 84 ns, indicating 900X speedup as compared to state-of-the-art software approach of sparse coding in image feature extraction.
The biggest difficulty that students face when learning programming is in developing the necessary cognitive skills that allows them to apply what they have learnt. It is generally accepted that programming is one of ...
详细信息
ISBN:
(纸本)9781467376853
The biggest difficulty that students face when learning programming is in developing the necessary cognitive skills that allows them to apply what they have learnt. It is generally accepted that programming is one of those things that can only be learnt by doing and actively engaging with it. parallel programming is a prime example of a programming area that students commonly struggle with. A major inhibitor is due to some of its abstract concepts, making it difficult to grasp a true understanding of the underlying principles in a traditional classroom setting. This paper discusses the underlying principles that motivated the development of Active Classroom Programmer (ACP), a tool for students to learn effective programming strategies with the guidance of their instructor. ACP aims to increase students skills in applying programming topics, by immediately engaging them with the newly introduced material. This is especially important in parallel programming, as the topics quickly progress onto the many parallelisation caveats (such as thread-safety, race conditions, and so on). While laboratory or homework exercises provide students with valuable hands-on experience (to apply newly taught concepts), this opportunity generally arrives too late after the material is presented in the lesson. To address this, a collection of parallel programming exercises are being developed for the NSF/IEEE-TCPP Curriculum Initiative on parallel and Distributed Computing (as an Early Adopter award), with the help of ACP. Instructors are welcome to utilise any of the developed exercises, or even request a private ACP account for their own courses to program with their students.
Concurrent programming tools strive to exploit hardware resources as much as possible. Nonetheless, the lack of high level abstraction of such tools often require from the user a reasonable amount of knowledge in orde...
详细信息
Concurrent programming tools strive to exploit hardware resources as much as possible. Nonetheless, the lack of high level abstraction of such tools often require from the user a reasonable amount of knowledge in order to achieve satisfactory performance requirements as well as they do not prevent error prone situations. In this paper we present Kanga, a framework based on the abstractions of skeletons to provide a generic tool that encapsulate many common parallel patterns. Through two case studies we validate the framework implementation.
Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective...
详细信息
Structured parallel programming, and in particular programming models using the algorithmic skeleton or parallel design pattern concepts, are increasingly considered to be the only viable means of supporting effective development of scalable and efficient parallel programs. Structured parallel programming models have been assessed in a number of works in the context of performance. In this paper we consider how the use of structured parallel programming models allows knowledge of the parallel patterns present to be harnessed to address both performance and energy consumption. We consider different features of structured parallel programming that may be leveraged to impact the performance/energy trade-off and we discuss a preliminary set of experiments validating our claims.
The General Purpose GPU computational model changes the way parallel processing can be achieved. It is becoming more attractive to carry out parallel tasks on GPU devices. The sequential part of the application runs o...
详细信息
The General Purpose GPU computational model changes the way parallel processing can be achieved. It is becoming more attractive to carry out parallel tasks on GPU devices. The sequential part of the application runs on the CPU whereas the computationally-intensive part is accelerated by the GPU. GPUs provide a multithreaded high level of parallelism with hundreds of cores. For high performance computing developers, the GPU cores offer a higher magnitude order of raw computation power than CPU. In this paper we propose an efficient parallel programming framework based on the GPU devices. This framework adopts the Gamma formalism as an abstract model for making parallelism less difficult. The software developer has only to specify the action to be curried-out on any atomic portion of data. The framework will then run the given action simultaneously on the GPU cores.
Using traditional methods, it is very difficult to develop high quality, portable software for parallel computers. In particular, parallel software cannot be developed on low cost, sequential computers and then moved ...
详细信息
Using traditional methods, it is very difficult to develop high quality, portable software for parallel computers. In particular, parallel software cannot be developed on low cost, sequential computers and then moved to high performance parallel computers without extensive rewriting and debugging. In the paper, the CSS system being under development at the Institute of Informatics Systems is considered. The CSS is aimed to be an interactive visual environment for supporting of functional programming and cloud supercomputing. The input language of the CSS system is a functional language Cloud Sisal that exposes implicit parallelism through data dependence and guarantees determinate result. The CSS system provides means to write and debug functional programs regardless target architectures on low-cost devices as well as to translate them into optimized parallel programs, appropriate to the target execution platforms, and then execute on high performance parallel computers without extensive rewriting and debugging.
Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more ...
详细信息
ISBN:
(纸本)9781467376853
Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more parallel programming content into undergraduate computer science education. One obstacle to doing this is that the programming languages most commonly used for parallel programming are detailed, low-level languages such as C, C++, Fortran (with OpenMP or MPI), OpenCL and CUDA. These languages allow programmers to write very efficient code, but that is not so important for those whose goal is to learn the concepts of parallel computing. This paper introduces a parallel programming language called Tetra which provides parallel programming features as first class language features, and also provides garbage collection and is designed to be as simple as possible. Tetra also includes an integrated development environment which is specifically geared for debugging parallel programs and visualizing program execution across multiple threads.
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed m...
详细信息
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed mechanism is built on top of the D-Bus protocol for message transmission, which allows a higher abstraction level and control when compared to lower-level mechanisms such as UNIX Pipes. Optimizations adopted by the implementation of CHAOS-MCAPI resulted in significant performance gains in relation to the original D-Bus implementation, which should be further improved by the adoption of KDBus, a 'zero-copy' mechanism recently made available natively in the Linux Kernel. That should make CHAOS-MCAPI a viable alternative for the design and implementation of parallel programs targeting multicore platforms, both in terms of scalability and programmer's productivity.
暂无评论