Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitat...
详细信息
ISBN:
(纸本)9781450343879
Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitations of power consumption, heat dissipation, and programming models which are designed for current hardware platforms. Instead, new hardware technologies, coupled with improved programming abstractions and more autonomous runtime systems, are required to achieve this goal. This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. By extending and enhancing the OpenCL framework, this work will both simplify the programming of current and future HPC applications, as well as automating the scheduling of data and computation across this new hardware platform. Also, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration of hardware platforms.
The superb efficiency and noise resilience of human cognizance comes from the extensive highly associative memory. For example, it is easy for human to recognize occluded or incomplete text images based on its context...
详细信息
ISBN:
(纸本)9781509035250
The superb efficiency and noise resilience of human cognizance comes from the extensive highly associative memory. For example, it is easy for human to recognize occluded or incomplete text images based on its context. Associative inference in the neocortex system is a concurrent process. Serial implementation of this concurrent process not only hinders its performance, but also limits the quality of recall. This paper investigates parallel implementation of associative inference using cogent confabulation model, which is a highly cross-dependent and cyclic knowledge network that supports probabilistic inference. By breaking the fixed processing order, which is typical in sequential processing, and introducing randomness generated from the race conditions in parallel processing, we do not only reduce the runtime, but also improve the accuracy. Further improvement can be achieved by scheduling the lexicon processing intermittently, which provides time for the changes to settle down. Using sentence construction as a case study, we demonstrate that the parallel implementation provides up to 93.4% reduction in computation time and 5% improvement in recall accuracy.
This paper has been proposed to present a simple approach for load flow analysis of a radial distribution network using parallel programming in Computationally Unified Device Architecture (CUDA). The proposed approach...
详细信息
ISBN:
(纸本)9789811001352;9789811001338
This paper has been proposed to present a simple approach for load flow analysis of a radial distribution network using parallel programming in Computationally Unified Device Architecture (CUDA). The proposed approach applies Breadth First Search to evaluate the nodes in the network and Kirchhoff's current law (KCL) as well as Kirchhoff's Voltage Law (KVL) for evaluating the current and voltages at each of the network nodes. The procedure is repeated till the convergence criterion is achieved. The paper demonstrates the working of Breadth First Search using CUDA. The efficiency of load flow algorithm has been enhanced by utilizing parallel computational power of Graphics Processing Unit (GPU). This approach has been tested for 33-nodes as well as for 69-nodes radial distribution systems and comparison has been done between the performances of sequential approach over CPU and parallel approach on GPU. The results show that introducing CUDA to load flow analysis speeds up the performance of the system by faster executions and gives accurate desired results as compared to sequential approach.
Although now parallel computing is very common, current parallel programming methods tend to be domain-specific (specializing in certain program patterns such as nested loops) and/ or manual (programmers need to speci...
详细信息
ISBN:
(纸本)9781450344333
Although now parallel computing is very common, current parallel programming methods tend to be domain-specific (specializing in certain program patterns such as nested loops) and/ or manual (programmers need to specify independent tasks). This situation poses a serious difficulty in developing efficient parallel programs. We often need to manually transform codes written in usual programming patterns to ones in a parallelizable form. We hope to have a solid foundation to streamline this transformation. This talk first reviews necessity of a method of systematically deriving parallelizable codes and then introduces an ongoing work on extending lambda calculus for the purpose. The distinguished feature of the new calculus is a special construct that enable evaluation with incomplete information, which is useful to express important parallel computation patterns such as reductions (aggregations). We then investigate derivations of parallelizable codes as transformations on the calculus.
The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, ha...
详细信息
The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, has given rise to a new range of devices created to improve performance. Heterogeneous Computing (HC), and many-cores are examples of this new range of devices. However, the complexity of these new hardware architectures is not easily hidden from the programmer. In this thesis, I propose a set of tools that seek to exploit (through source-to-source (S2S) compilers) the capabilities and peculiarities of parallel computing and HC to speed up and increase the energy efficiency of originally sequential source code. The proposed modular programs are implemented as a set of tools that help port sequential source code to OpenMP, MPI, and HMPP, demonstrating how the in- put code can effectively automatically be translated. Through a real-life example, I show how the proposed dependency analysis tool trivializes the task of paral- lelizing sequential code, breaking the first performance barrier. The OMP2MPI experiments generate code that is more than 60× faster than its sequential version and also faster than its original OpenMP code. The OMP2HMPP experiments ob- tain an average speedup of 31× and average increase in energy efficiency of 5.86×. Both tools were tested with OpenMP, obtaining successful results that demonstrate the feasibility of using this set of tools for exploring HC.
Suzaku is a pattern programming framework that enables programmers to create pattern-based parallel MPI programs without writing the MPI message-passing code implicit in the patterns. The purpose of this framework is ...
详细信息
ISBN:
(纸本)9781509036820
Suzaku is a pattern programming framework that enables programmers to create pattern-based parallel MPI programs without writing the MPI message-passing code implicit in the patterns. The purpose of this framework is to simplify message-passing programming and create better structured programs based upon established parallel design patterns. The focus for developing Suzaku is on teaching parallel programming. This paper covers the main features of Suzaku and describes our experiences using it in parallel programming classes.
Reconstructing genomes of organisms from high-throughput sequencing experiments without a reference genome available (de novo assembly) is a challenging problem which has been approached in several ways in the past de...
详细信息
ISBN:
(纸本)9781450342254
Reconstructing genomes of organisms from high-throughput sequencing experiments without a reference genome available (de novo assembly) is a challenging problem which has been approached in several ways in the past decade. Although numerous methods are available and many offer fair performance in reconstruction, there is a lack of generalized template libraries and interchangeable data structures/methods for serial, multithreaded and distributed processing. In this work we propose a novel set of cache oblivious generic data structures for serial, multithreaded and distributed processing of high-throughput sequencing data for the creation of de Bruijn or k-mer graphs towards their usage in de novo assembly and related HTS data analytics problems.
SIMD extensions were added to microprocessors in the mid '90s to speed-up data-parallel code by vectorization. Unfortunately, the SIMD programming model has barely evolved and the most efficient utilization is sti...
详细信息
ISBN:
(纸本)9781450340601
SIMD extensions were added to microprocessors in the mid '90s to speed-up data-parallel code by vectorization. Unfortunately, the SIMD programming model has barely evolved and the most efficient utilization is still obtained with elaborate intrinsics coding. As a consequence, several approaches to write efficient and portable SIMD code have been proposed. In this work, we evaluate current programming models for the C++ language, which claim to simplify SIMD programming while maintaining high performance. The proposals were assessed by implementing two kernels: one standard floating-point benchmark and one real-world integerbasedc application, both highly data parallel. Results show that the proposed solutions perform well for the floating point kernel, achieving close to the maximum possible speed-up. For the real-world application, the programming models exhibit significant performance gaps due to data type issues, missing template support and other problems discussed in this paper. Copyright is held by the owner/author(s).
In this paper, we present our Concurrent Systems class, where parallel programming and parallel and distributed computing (PDC) concepts have been taught for more than 20 years. Despite several rounds of changes in ha...
详细信息
parallel computing deals with simultaneous computation of problems to speed up the total time required for serial computation. Many problems involving huge number of computations can be sub divided into smaller ones e...
详细信息
暂无评论