The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of th...
详细信息
ISBN:
(纸本)9781538655559
The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed m...
详细信息
ISBN:
(纸本)9781467386210
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed mechanism is built on top of the D-Bus protocol for message transmission, which allows a higher abstraction level and control when compared to lower-level mechanisms such as UNIX Pipes. Optimizations adopted by the implementation of CHAOS-MCAPI resulted in significant performance gains in relation to the original D-Bus implementation, which should be further improved by the adoption of KDBus, a 'zero-copy' mechanism recently made available natively in the Linux Kernel. That should make CHAOS-MCAPI a viable alternative for the design and implementation of parallel programs targeting multicore platforms, both in terms of scalability and programmer's productivity.
This paper describes how we utilized cooperative learning to meet the practical challenges of teaching parallel programming in the early college years, as well as to provide a more real world context to the course. Ou...
详细信息
ISBN:
(纸本)1581133294
This paper describes how we utilized cooperative learning to meet the practical challenges of teaching parallel programming in the early college years, as well as to provide a more real world context to the course. Our main contribution is a set of cooperative group activities for both inside and outside the classroom, which are targeted to the computer science discipline, have received very positive student feedback, are easy to implement, and achieve a number of learning objectives beyond knowledge of the specific topic. These activities can be applied directly or be easily adapted to other computer science courses, particularly programming, systems, and experimental computer science courses.
The popularization of parallelism is arguably the most fundamental computing challenge for years to come. We present an approach where parallel programming takes place in a restricted (sub-Turing-complete), logic-base...
详细信息
ISBN:
(纸本)9783642310577;9783642310560
The popularization of parallelism is arguably the most fundamental computing challenge for years to come. We present an approach where parallel programming takes place in a restricted (sub-Turing-complete), logic-based declarative language, embedded in Java. Our logic-based language, PQL, can express the parallel elements of a computing task, while regular Java code captures sequential elements. This approach offers a key property: the purely declarative nature of our language allows for aggressive optimization, in much the same way that relational queries are optimized by a database engine. At the same time, declarative queries can operate on plain Java data, extending patterns such as map-reduce to arbitrary levels of nesting and composition complexity. We have implemented PQL as extension to a Java compiler and showcase its expressiveness as well as its scalability compared to competitive techniques for similar tasks (Java + relational queries, in-memory Hadoop, etc.).
We are developing a shared-variable refinement calculus in the style of the sequential calculi of Back, Morgan, and Morris. As part of this work, we're studying different theories of shared-variable programming. U...
详细信息
ISBN:
(纸本)3540000291
We are developing a shared-variable refinement calculus in the style of the sequential calculi of Back, Morgan, and Morris. As part of this work, we're studying different theories of shared-variable programming. Using the concepts and notations of Hoare & He's unifying theories of programming (UTP), we give a formal semantics to a programming language that contains sequential composition, conditional statements, while loops, nested parallel composition, and shared variables. We first give a UTP semantics to labelled action systems, and then use this to give the semantics of our programs. Labelled action systems have a unique normal form that allows a simple formalisation and validation of different logics for reasoning about shared-variable programs. In this paper, we demonstrate how this is done for Lamport's Concurrent Hoare Logic.
The first Spanish parallel programming Contest was organized in September 2011 within the Spanish Jornadas de Paralelismo. The aim of the contest is to disseminate parallelism among Computer Science students. The webs...
详细信息
ISBN:
(纸本)9780769546766
The first Spanish parallel programming Contest was organized in September 2011 within the Spanish Jornadas de Paralelismo. The aim of the contest is to disseminate parallelism among Computer Science students. The website and the material generated can be used for educational purposes. This paper comments on the organization of the contest and summarizes some training activities in which the material of the contest is being or can be used.
parallel programs are increasingly being written using programming frameworks and other environments that allow parallel constructs to be programmed with greater ease. The data structures used allow the modeling of co...
详细信息
ISBN:
(纸本)9783642038686
parallel programs are increasingly being written using programming frameworks and other environments that allow parallel constructs to be programmed with greater ease. The data structures used allow the modeling of complex mathematical structures like linear systems and partial differential equations using high-level programming abstractions. While this allows programmers to model complex systems in a more intuitive way, it;also makes the debugging and profiling of these systems more difficult due to the complexity of mapping these high level abstractions down to die low level parallel programming constructs. This work discusses mapping mechanisms, called variable blame, for creating these wrappings and using them to assist in the profiling and debugging of programs created using advanced parallel programming techniques. We also include an example of a. prototype implementation of the system profiling three programs.
In this work, the parallel Hierarchical Interface Decomposition Algorithm (PHIDAL) and a hybrid parallel programming model were applied to finite-element based simulations of linear elasticity problems in media with h...
详细信息
ISBN:
(纸本)9783540754435
In this work, the parallel Hierarchical Interface Decomposition Algorithm (PHIDAL) and a hybrid parallel programming model were applied to finite-element based simulations of linear elasticity problems in media with heterogeneous material properties using parallel preconditioned iterative solvers. Reverse Cuthill-McKee reordering with cyclic multicoloring (CM-RCM) was applied for parallelism on each SNIP node through OpenMP. The developed code has been tested on the IBM p5-575 and the TSUBAME Grid Cluster using up to 512 cores. Preconditioners based on PHIDAL provide a superior scalable performance and robustness on both architectures in comparison to conventional block Jacobi-type localized preconditioners.
Multicore computers are ubiquitous, and proposals to extend existing languages with parallel constructs mushroom. While everyone claims to make parallel programming easier and less error-prone, empirical language usab...
详细信息
ISBN:
(纸本)9781450304450
Multicore computers are ubiquitous, and proposals to extend existing languages with parallel constructs mushroom. While everyone claims to make parallel programming easier and less error-prone, empirical language usability evaluations are rarely done in-the-field with many users and real programs. Key obstacles are costs and a lack of appropriate environments to gather enough data for representative conclusions. This paper discusses the idea of automating the usability evaluation of parallel language constructs by gathering subjective and objective data directly in every software engineer's IDE. The paper presents an Eclipse prototype suite that can aggregate such data from potentially hundreds of thousands of programmers. Mismatch detection in subjective and objective feedback as well as construct usage mining can improve language design at an early stage, thus reducing the risk of developing and maintaining inappropriate constructs. New research directions arising from this idea are outlined for software repository mining, debugging, and software economics.
Single-board computers have recently grown to offer developers a wide range of options where the common denominators are low power and low cost. In this paper, we present an embedded cluster platform for a remote para...
详细信息
ISBN:
(纸本)9781728109305
Single-board computers have recently grown to offer developers a wide range of options where the common denominators are low power and low cost. In this paper, we present an embedded cluster platform for a remote parallel programming lab to be used in an online course. A remote lab server handles all requests coming from the front-end running on an online learning platform and controls the execution of the parallel programming assignments submitted by students. The embedded cluster where the jobs run is made out of single-board computers connected through a gigabit network among them and to the lab server. In our first working prototype, we have tested six different state-of-the-art single-board computers, evaluating their processing latency, price, and tools compatibility. We found that the Vim3Pro performed best overall, being the fastest in most tests, having a mid-range price, and being only two times slower than a much more expensive high-end Xeon processor when using the same amount of cores.
暂无评论