Research on high-level parallel programming approaches systematically evaluate the performance of applications written using these approaches and informally argue that high-level parallel programming languages or libr...
详细信息
ISBN:
(纸本)9781479953134
Research on high-level parallel programming approaches systematically evaluate the performance of applications written using these approaches and informally argue that high-level parallel programming languages or libraries increase the productivity of programmers. In this paper we present a methodology that allows to evaluate the trade-off between programming effort and performance of applications developed using different programming models. We apply this methodology on some implementations of a function solving the all nearest smaller values problem. The high-level implementation is based on a new version of the BSP homomorphism algorithmic skeleton.
parallel programming is commonly done through a library approach, as in the Message Passing Interface (MPI), directives, as in OpenMP, language extensions, as in High Performance Fortran (HPF), or whole new languages,...
详细信息
The increased presence of parallel computing platforms brings concerns to the general purpose domain that were previously prevalent only in the specific niche of high-performance computing. As parallel programming tec...
详细信息
ISBN:
(纸本)9781450332170
The increased presence of parallel computing platforms brings concerns to the general purpose domain that were previously prevalent only in the specific niche of high-performance computing. As parallel programming technologies become more prevalent in the form of new emerging programming languages and extensions of existing languages, additional safety concerns arise as part of the paradigm shift from sequential to parallel behaviour. In this paper, we propose various syntax extensions to the Ada language, which provide mechanisms whereby the compiler is given the necessary semantic information to enable the implicit and explicit parallelization of code. The model is based on earlier work, which separates parallelism specification from concurrency implementation, but proposes an updated syntax with additional mechanisms to facilitate the development of safer parallel programs. Copyright 2014 ACM.
The advantages of the Monte Carlo method for reactor analysis are well known, but the full-core reactor analysis challenges the computational time and computer memory. Meanwhile, the exponential growth of computer pow...
详细信息
Python has become the de facto language for scientific computing. programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the ...
详细信息
Python has become the de facto language for scientific computing. programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High-Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. This work presents a workflow that retains Python's high productivity while achieving portable performance across different architectures. The workflow's key features are HPC-oriented language extensions and a set of automatic optimizations powered by a data-centric intermediate representation. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer (up to 23,328 cores), with 2.47x and 3.75x speedups over previous-best solutions, first-ever Xilinx and Intel FPGA results of annotated Python, and up to 93.16% scaling efficiency on 512 nodes. Our benchmarks were reproduced in the Student Cluster Competition (SCC) during the Supercomputing Conference (SC) 2022. We present and discuss the student teams' results.
PARRAY (or parallelizing ARRAYs) is an extension of C language that supports system-level succinct programming for heterogeneous parallel systems. Parray extends mainstream C programming with novel array types. This l...
详细信息
Spectral clustering algorithms have been used in various research domains to discover structure and patterns in data. However, high computational and space complexity hinders their usage for large-scale datasets in ma...
详细信息
Spectral clustering algorithms have been used in various research domains to discover structure and patterns in data. However, high computational and space complexity hinders their usage for large-scale datasets in machine learning and bioinformatics. Various approximate spectral clustering methods were proposed in the open literature to solve those problems. In this paper, we describe our GPU-based, parallel implementation of an approximate spectral algorithm based on the Nystrom method and column sampling and its memory-efficient variant. We evaluate our solution using several annotated datasets, such as USPS, MNIST, and MNIST8, as well as bioinformatics data, especially from the domain of single-cell and spatial transcriptomics. We obtain speedups of up to 31.8x depending on the dataset used and demonstrate the scalability of the solution for the datasets with up to four million samples.
Multi-core architectures have increased the power of parallelism by coupling many cores in a single chip. This becomes even more complex for developers to exploit the available parallelism in order to provide high per...
详细信息
Several algorithms applied to the solution of specific problems in ph sics require high performance computing. This is the case, for exanjple, in the field of digital image processing, where the required performance i...
详细信息
Several algorithms applied to the solution of specific problems in ph sics require high performance computing. This is the case, for exanjple, in the field of digital image processing, where the required performance in terms of speed, and sometimes running an a real time environment, leads to the use of parallel programrmng tools. To meet this demand it is important to understand these tools, highlighting differences and their possible applications. Moreover, research centers around the world has available a clusters of computer, or a multi-core platform. with a strong potential of using parallel programming techniques. Ibis study aims to charaetertre threads and forks parallel programming techniques. Both techniques allow the develcpnient of parallel codes, which with its own restrictions on the inter process comniunication and programming format. This Technical Note aims to highlight the use of each of these techniques, and to present an agplication in the area of image processing in which they were used. The application part of this work was develctped in the international collaboration with the JET Laboratory (Join European Torus of the European Atomic Energy Community I EURATOM). The TET Laboratory investigates the process of forming the plasma and its nstability, which appears as a toroidal ring of increased radiation, known as MARFE (Multifaceted Asymmetric Radiation From The Edge). The activities have explored the techniques of parallel programming algorithms in digital image processing. The presented algorithms allow achieving a processing rate higher than 10 000 images per second and use threads and shared memory communication between independent processes, which is equivalent to fork.
The evolution of Graphics Processing Units (GPUs) has allowed the industry to overcome long-lasting problems and challenges. Many belong to the stream processing domain, whose central aspect is continuously receiving ...
详细信息
The evolution of Graphics Processing Units (GPUs) has allowed the industry to overcome long-lasting problems and challenges. Many belong to the stream processing domain, whose central aspect is continuously receiving and processing data from streaming data producers such as cameras and sensors. Nonetheless, programming GPUs is challenging because it requires deep knowledge of many-core programming, mechanisms and optimizations for GPUs. Current GPU programming standards do not target stream processing and present programmability and code portability limitations. Among our main scientific contributions resides GSParLib, a C++ multi-level programming interface unifying CUDA and OpenCL for GPU processing on stream and data parallelism with negligible performance losses compared to manual implementations;GSParLib is organized in two layers: one for general-purpose computing and another for high-level structured programming based on parallel patterns;a methodology to provide unified and driver agnostic interfaces minimizing performance losses;a set of parallelism strategies and optimizations for GPU processing targeting stream and data parallelism;and new experiments covering GPU performance on applications exposing stream and data parallelism.
暂无评论