In order to reduce the complexity of traditional multithreaded parallel programming, this paper explores a new task-based parallel programming using the Microsoft. NET Task parallel Library (TPL). Firstly, this paper ...
详细信息
ISBN:
(纸本)9781479932795
In order to reduce the complexity of traditional multithreaded parallel programming, this paper explores a new task-based parallel programming using the Microsoft. NET Task parallel Library (TPL). Firstly, this paper proposes a custom data partitioning optimization method to achieve an efficient data parallelism, and applies it to the matrix multiplication. The result of the application supports the custom data partitioning optimization method. Then we develop a task parallel application: Image Blender, and this application explains the efficiency and pitfall aspects associated with task parallelism. Finally, the paper analyzes the performance of our applications. Experiments results show that TPL can dramatically alleviate programmer burden and boost the performance of programs with its task-based parallel programming mechanism.
parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core sha...
详细信息
parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core shared memory node. We propose to have each thread gather the data it needs for processing a given iteration range, before to actually advance the computation by one time step on this range. This lazy cache aware layout construction enables to keep the original data structure and leads to very localised code modifications. We show that this approach can improve the execution time by up to 40% when the task size is set to have the data fit in the L2 cache.
The HPCmatlab framework has been developed for Distributed Memory programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrapper...
详细信息
The HPCmatlab framework has been developed for Distributed Memory programming in Matlab/Octave using the Message Passing Interface (MPI). The communication routines in the MPI library are implemented using MEX wrappers. Point-to-point, collective as well as one-sided communication is supported. Benchmarking results show better performance than the Mathworks Distributed Computing Server. HPCmatlab has been used to successfully parallelize and speed up Matlab applications developed for scientific computing. The application results show good scalability, while preserving the ease of programmability. HPCmatlab also enables shared memory programming using Pthreads and parallel I/O using the ADIOS package.
This paper intends to achieve high performance in terms of time by implementing various time consuming application on NVIDIA Graphics Processing Unit (GPU) by using parallel programming model NVIDIA Compute Unified De...
详细信息
This paper intends to achieve high performance in terms of time by implementing various time consuming application on NVIDIA Graphics Processing Unit (GPU) by using parallel programming model NVIDIA Compute Unified Device Architecture (CUDA). NVIDIA CUDA provides platform for developing parallel applications on NVIDIA GPUs. So it gives developers a platform to build high-end parallel processing applications. This paper implements various image processing algorithms on both Central Processing Unit (CPU) and GPU. Implemented point-to-point image processing algorithms are brightening filter, darkening filter, negative filter and RGB to Grayscale filter. Along with various convolution algorithms that consider value of its neighboring pixels are also implemented. Implemented convolution algorithms are sobel filter for edge detection, low pass filter and high pass filter. Performance analysis of the implemented image processing algorithms is done on both CPU and GPU. Analysis is made on images of resolution 3000 X 3000. Color-ed images are used for point-to-point pixel processing algorithms. Grayscale images are used for all convolution algorithms. Performance analysis done for point-to-point processing algorithms by varying number of threads per block. Recursive ray tracing is also implemented on GPU, and found performance gain compare to serial algorithm run on CPU.
The design and test of Multi-Processor System-on-Chips (MPSoCs) and development of distributed applications and/or operating systems executed on those hardware platforms is one of the biggest challenges in today's...
详细信息
The design and test of Multi-Processor System-on-Chips (MPSoCs) and development of distributed applications and/or operating systems executed on those hardware platforms is one of the biggest challenges in today's system design. This applies in particular when short time-to-market constraints impose serious limitations on the exploration of the design space. The use of virtual platforms can help in decreasing the development and test cycles. In this paper, we present a cloud-based environment supporting the user in designing heterogeneous MPSoCs and developing distributed applications. Therefore, the design environment generates virtual platforms automatically allowing fast prototyping cycles especially in the software development process, and exports the design to a hardware flow synthesizing compatible FPGA designs. The extension of the peripheral models with debug information supports the developer during test and debug cycles and avoids the need of adding special debug codes in the application. This improves the readability, portability and maintainability of produced software. Additionally, this paper presents the benefits of using cloud-based design environments in engineers' trainings and educations. Therefore, the framework supports testing the system including complex software stacks with prerecorded data or testbenches.
In this paper we investigate the current compiler technologies supporting OpenMP 4.x features targeting a range of devices, in particular, the Cray compiler 8.5.0 targeting an Intel Xeon Broadwell and NVIDIA K20x, IBM...
详细信息
Recently, President Obama issued an Executive Order to ensure the United States' leadership in computing. Necessary hardware and software design skills should be introduced into university curricula. Computing has...
详细信息
Recently, President Obama issued an Executive Order to ensure the United States' leadership in computing. Necessary hardware and software design skills should be introduced into university curricula. Computing has been advanced to High Performance Computing (HPC) throughout the past decades. However, undergraduate students are still lacking of experience in how HPC functions especially in minority-serving institutions, because our current computing curricula do not adequately cover HPC contents. To address this problem, a team of faculty members have obtained external funding supports to improve undergraduate computing education through enhanced courses and research opportunities. The goal is to incorporate HPC concepts and training across the computing curricula in multiple disciplines in order to motivate students' interests in computing and improve their problem-solving skills. This three-year project has already finished the second year of implementation. During the first year, a diverse teaching environment was established, including a HPC cluster and embedded HPC platforms. Both platforms supported students' learning and research in parallel programming, embedded systems design, and data cloud. In the second project year, several courses were revised or developed across three departments: Electrical and Computer Engineering, Computer Science, and Engineering Technology. New course materials integrating the parallel and distributed computing concepts were developed and offered to undergraduate students. Project-based learning was introduced into classroom. More advanced concepts, such as computer vision and machine learning were explored by undergraduate students. At the same time, the research results were disseminated in junior and senior level courses. Faculty members applied effective pedagogy to teach new generation computing. For all the classes involved in this project, student surveys were collected to guide future project implementation. This article s
HiPro-CodeGen is a code generation engine designed for numerical simulation development. Its central objective is to produce a parallel software framework with standard structure for an application developed on JASMIN...
详细信息
ISBN:
(纸本)9789897580369
HiPro-CodeGen is a code generation engine designed for numerical simulation development. Its central objective is to produce a parallel software framework with standard structure for an application developed on JASMIN, a domain-specific computational framework. The unique parallel part and all interfaces of the application are generated and implementation of sequential subroutines is the only part of the code left to be written manually for a programmer. The design and implementation of the code generation engine is introduced which combines numerical mathematics with component-based programming to create ontological models for parallel simulations. A hybrid programming method is proposed on the work mechanism of the engine which combines graphical and textual approaches to hide parallel programming and object-oriented programming from developers. A real application is presented to show the effectiveness and efficiency of the engine.
This paper describes the design, implementation and testing of "Danse-doigts", an edutainment therapeutic application for hemiplegic children. The objective of this program is twofold. Firstly, to allow them...
详细信息
Graphics processing unit (GPU) accelerated computing has pioneered a new direction of research for various combinatorial optimization problems. One such problem which requires huge computation is protein structure pre...
详细信息
Graphics processing unit (GPU) accelerated computing has pioneered a new direction of research for various combinatorial optimization problems. One such problem which requires huge computation is protein structure prediction (PSP). PSP is NP -complete problem. Computational prediction of protein native structure from its primary amino acid sequence is termed as ab initio PSP problem. Till date, wet lab experiments conducted on PSP indicate that existing methods take lots of experimentation time and expensive. As a consequence, only 1% of the sequence's structures are known. This work presents a parallel programming approach with GPU computing for PSP using 2D triangular hydrophobic-polar (HP) lattice model. The implementation of proposed approach is tested on the set of HP benchmark sequence of a length ranging from 25 to 100. The experimental result shows that the proposed approach has significantly improved the performance of prediction with immense drop in computation time.
暂无评论