As the Pawsey Centre project continues, in 2013 iVEC was tasked with deciding which accelerator technology to use in the petascale supercomputer to be delivered in mid 2014. While accelerators provide impressive perfo...
详细信息
As the Pawsey Centre project continues, in 2013 iVEC was tasked with deciding which accelerator technology to use in the petascale supercomputer to be delivered in mid 2014. While accelerators provide impressive performance and efficiency, an important factor in this decision is the usability of the technologies. To assist in the assessment of technologies, iVEC conducted a code sprint where iVEC staff and advanced users were paired to make use of a range of tools to port their codes to two architectures. Results of the sprint indicate that certain subtasks could benefit from using the tools in the code-acceleration process;however, there will be many hurdles users will face in migrating to either of the platforms explored.
A hash function hashes a longer message of arbitrary length into a much shorter bit string of fixed length, called a hash. Inevitably, there will be a lot of different messages being hashed to the same or similar hash...
详细信息
ISBN:
(纸本)9781538638873
A hash function hashes a longer message of arbitrary length into a much shorter bit string of fixed length, called a hash. Inevitably, there will be a lot of different messages being hashed to the same or similar hash. We call this a hash collision or a partial hash collision. By utilizing multiple processors from the CUNY High Performance Computing Center's clusters, we can locate partial collisions for the hash functions MD5 and SHA1 by brute force parallel programming in C with MPI library. The brute force method of finding a second preimage collision entails systematically computing all of the permutations, hashes, and Hamming distances of the target preimage. We explore varying size target strings and the number of processors allocation to examine the effect these variables have on finding partial collisions. The results show that for the same message space the search time for the partial collisions is roughly halved for each doubling of the number of processors;the longer the message is the better partial collisions are produced(1).
We have used the Illinois Concert C++ system (which supports dynamic, object-based parallelism) to parallelize a flexible adaptive mesh refinement code for the Cosmology NSF Grand Challenge. Out goal is to enable prog...
详细信息
ISBN:
(纸本)3540653872
We have used the Illinois Concert C++ system (which supports dynamic, object-based parallelism) to parallelize a flexible adaptive mesh refinement code for the Cosmology NSF Grand Challenge. Out goal is to enable programmers of large-scale numerical applications to build complex applications with irregular structure using a high-level interface. The key elements are an aggressive optimizing compiler and runtime system support that harnesses the performance of the SGI-Cray Origin 2000 shared memory architecture. We have developed a configurable runtime system and a flexible Structured Adaptive Mesh Refinement (SAMR) application that runs with good performance. We describe the programming of SAMR using the Illinois Concert System, which is a concurrent object-oriented parallel programming interface, documenting the modest parallelization effort. We obtain good performance of up to 24.4 speedup on 32 processors of the Origin 2000. We also present results addressing the effect of virtual machine configuration and parallel grain size on performance. Our study characterizes the SAMR application and how our programming system design assists in parallelizing dynamic codes using high-level programming.
parallel programming is often regarded as one of the hardest programming disciplines. On the one hand, parallel programs are notoriously prone to concurrency errors;and, while trying to avoid such errors, achieving pr...
详细信息
ISBN:
(纸本)9783642400476
parallel programming is often regarded as one of the hardest programming disciplines. On the one hand, parallel programs are notoriously prone to concurrency errors;and, while trying to avoid such errors, achieving program performance becomes a significant challenge. As a result of the multicore revolution, parallel programming has however ceased to be a task for domain experts only. And for this reason, a large variety of languages and libraries have been proposed that promise to ease this task. This paper presents a study to investigate whether such approaches succeed in closing the gap between domain experts and mainstream developers. Four approaches are studied: Chapel, Cilk, Go, and Threading Building Blocks (TBB). Each approach is used to implement a suite of benchmark programs, which are then reviewed by notable experts in the language. By comparing original and revised versions with respect to source code size, coding time, execution time, and speedup, we gain insights into the importance of expert knowledge when using modern parallel programming approaches.
Automatic parallelizing compilers are often constrained in their transformations because they must conservatively respect data dependences within the program. Developers, on the other hand, often take advantage of dom...
详细信息
ISBN:
(纸本)9781665441735
Automatic parallelizing compilers are often constrained in their transformations because they must conservatively respect data dependences within the program. Developers, on the other hand, often take advantage of domain-specific knowledge to apply transformations that modify data dependences but respect the application's semantics. This creates a semantic gap between the parallelism extracted automatically by compilers and manually by developers. Although prior work has proposed programming language extensions to close this semantic gap, their relative contribution is unclear and it is uncertain whether compilers can actually achieve the same performance as manually parallelized code when using them. We quantify this semantic gap in a set of sequential and parallel programs and leverage these existing programming-language extensions to empirically measure the impact of closing it for an automatic parallelizing compiler. This lets us achieve an average speedup of 12.6x on an Intel-based 28-core machine, matching the speedup obtained by the manually parallelized code. Further, we apply these extensions to widely used sequential system tools, obtaining 7.1x speedup on the same system.
The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of th...
详细信息
ISBN:
(纸本)9781538655559
The stream processing paradigm is used in several scientific and enterprise applications in order to continuously compute results out of data items coming from data sources such as sensors. The full exploitation of the potential parallelism offered by current heterogeneous multi-cores equipped with one or more GPUs is still a challenge in the context of stream processing applications. In this work, our main goal is to present the parallel programming challenges that the programmer has to face when exploiting CPUs and GPUs' parallelism at the same time using traditional programming models. We highlight the parallelization methodology in two use-cases (the Mandelbrot Streaming benchmark and the PARSEC's Dedup application) to demonstrate the issues and benefits of using heterogeneous parallel hardware. The experiments conducted demonstrate how a high-level parallel programming model targeting stream processing like the one offered by SPar can be used to reduce the programming effort still offering a good level of performance if compared with state-of-the-art programming models.
This paper describes how we utilized cooperative learning to meet the practical challenges of teaching parallel programming in the early college years, as well as to provide a more real world context to the course. Ou...
详细信息
ISBN:
(纸本)1581133294
This paper describes how we utilized cooperative learning to meet the practical challenges of teaching parallel programming in the early college years, as well as to provide a more real world context to the course. Our main contribution is a set of cooperative group activities for both inside and outside the classroom, which are targeted to the computer science discipline, have received very positive student feedback, are easy to implement, and achieve a number of learning objectives beyond knowledge of the specific topic. These activities can be applied directly or be easily adapted to other computer science courses, particularly programming, systems, and experimental computer science courses.
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed m...
详细信息
ISBN:
(纸本)9781467386210
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed mechanism is built on top of the D-Bus protocol for message transmission, which allows a higher abstraction level and control when compared to lower-level mechanisms such as UNIX Pipes. Optimizations adopted by the implementation of CHAOS-MCAPI resulted in significant performance gains in relation to the original D-Bus implementation, which should be further improved by the adoption of KDBus, a 'zero-copy' mechanism recently made available natively in the Linux Kernel. That should make CHAOS-MCAPI a viable alternative for the design and implementation of parallel programs targeting multicore platforms, both in terms of scalability and programmer's productivity.
The popularization of parallelism is arguably the most fundamental computing challenge for years to come. We present an approach where parallel programming takes place in a restricted (sub-Turing-complete), logic-base...
详细信息
ISBN:
(纸本)9783642310577;9783642310560
The popularization of parallelism is arguably the most fundamental computing challenge for years to come. We present an approach where parallel programming takes place in a restricted (sub-Turing-complete), logic-based declarative language, embedded in Java. Our logic-based language, PQL, can express the parallel elements of a computing task, while regular Java code captures sequential elements. This approach offers a key property: the purely declarative nature of our language allows for aggressive optimization, in much the same way that relational queries are optimized by a database engine. At the same time, declarative queries can operate on plain Java data, extending patterns such as map-reduce to arbitrary levels of nesting and composition complexity. We have implemented PQL as extension to a Java compiler and showcase its expressiveness as well as its scalability compared to competitive techniques for similar tasks (Java + relational queries, in-memory Hadoop, etc.).
parallel programming is a field of science with a great potential nowadays due to the development of advanced computers architectures. Appropriate usage of this tool can be therefore highly beneficial in multimedia ap...
详细信息
ISBN:
(纸本)9788394941956
parallel programming is a field of science with a great potential nowadays due to the development of advanced computers architectures. Appropriate usage of this tool can be therefore highly beneficial in multimedia applications and significantly decreases the time of calculations. In this article, we analyze how the speed of calculations is influenced by the usage of parallel algorithms in image filtering processes. We present a method based on multithreading and the division of the image for rectangles. The filter is applied parallel on each part of the image. Results show that in some cases our proposition can bring over 90% benefit when compared to the classical approach.
暂无评论