Adaptive discretizations are important in compressible/incompressible flow problems since it is often necessary to resolve details on multiple levels, allowing large regions of space to be modeled using a reduced numb...
详细信息
Adaptive discretizations are important in compressible/incompressible flow problems since it is often necessary to resolve details on multiple levels, allowing large regions of space to be modeled using a reduced number of degrees of freedom (reducing the computational time). There are a wide variety of methods for adaptively discretizing space, but Cartesian grids have often outperformed them even at high resolutions due to their simple and accurate numerical stencils and their superior parallel performances. Such performance and simplicity are in general obtained applying a finite-difference scheme for the resolution of the problems involved, but this discretization approach does not present, by contrast, an easy adapting path. In a finite-volume scheme, instead, we can incorporate different types of grids, more suitable for adaptive refinements, increasing the complexity on the stencils and getting a greater flexibility. The Laplace operator is an essential building block of the Navier-Stokes equations, a model that governs fluid flows, but it occurs also in differential equations that describe many other physical phenomena, such as electric and gravitational potentials, and quantum mechanics. So, it is a very important differential operator, and all the studies carried out on it, prove its relevance. In this work will be presented 2D finite-difference and finite-volume approaches to solve the Laplacian operator, applying patches of overlapping grids where a more fined level is needed, leaving coarser meshes in the rest of the computational domain. These overlapping grids will have generic quadrilateral shapes. Specifically, the topics covered will be: 1) introduction to the finite difference method, finite volume method, domain partitioning, solution approximation; 2) overview of different types of meshes to represent in a discrete way the geometry involved in a problem, with a focus on the octree data structure, presenting PABLO and PABLitO. The first one is an
Transactional Memory (TM) is an emerging programming paradigm that drastically simplifies the development of concurrent applications by reliev- ing programmers from a major source of complexity: how to ensure correct,...
详细信息
Transactional Memory (TM) is an emerging programming paradigm that drastically simplifies the development of concurrent applications by reliev- ing programmers from a major source of complexity: how to ensure correct, yet efficient, synchronization of concurrent accesses to shared memory. De- spite the large body of research devoted to this area, existing TM systems still suffer from severe limitations that hamper both their performance and energy efficiency. This dissertation tackles the problem of how to build efficient implemen- tations of the TM abstraction by introducing innovative techniques that ad- dress three crucial limitations of existing TM systems by: (i) extending the effective capacity of Hardware TM (HTM) implementations; (ii) reducing the synchronization overheads in Hybrid TM (HyTM) systems; (iii) enhanc- ing the efficiency of TM applications via energy-aware contention manage- ment schemes. The first contribution of this dissertation, named POWER8-TM (P8TM), addresses what is arguably one of the most compelling limita- tions of existing HTM implementations: the inability to process transac- tions whose footprint exceeds the capacity of the processor's cache. By leveraging, in an innovative way, two hardware features provided by IBM POWER8 processors, namely Rollback-only Transactions and Suspend/Re- sume, P8TM can achieve up to 7× performance gains in workloads that stress the capacity limitations of HTM. The second contribution is DMP-TM (Dynamic Memory Partitioning- TM), a novel Hybrid TM (HyTM) that offloads the cost of detecting con- flicts between HTM and Software TM (STM) to off-the-shelf operating sys- tem memory protection mechanisms. DMP-TM's design is agnostic to the STM algorithm and has the key advantage of allowing for integrating, in an efficient way, highly scalable STM implementations that would, otherwise, demand expensive instrumentation of the HTM path. This allows DMP-TM to achieve up to 20× speedups compared to state of the ar
Fortran Coarrays are a well known data structure in High Performance Computing (HPC) applications. There have been various attempts to port the concept to other programming languages that have a wider user base outsid...
详细信息
HybriLIT heterogeneous platform that is a component of the Multifunctional Information and Computing Complex (MICC) of Joint Institute for Nuclear Research. HybriLIT includes GOVORUN supercomputer and education and te...
详细信息
Multi-core processors offer a growing potential of parallelism but pose a challenge of program development for achieving high performance in applications. This pape r presents a comparison of the five parallel program...
详细信息
Multi-core processors offer a growing potential of parallelism but pose a challenge of program development for achieving high performance in applications. This pape r presents a comparison of the five parallel programming models for implementing parallel programs in C++ on multi -core computer systems. The models under consideration are Intel ® 's Thread Building Blocks (TBB), OpenMPI, Intel ® 's Cilk™ Plus, OpenMP and Pthreads. For demonstration purposes multiple parallel implementations of an algorithm for matrix multiplication suitable for parallelization were created. The main goal of this paper is a comprehensive comparison of chosen models with respect to the following criteria: performance and coding effort required.
In this paper, we present a new algorithm for parallel Monte Carlo tree search (MCTS). It is based on the pipeline pattern and allows flexible management of the control flow of the operations in parallel MCTS. The pip...
详细信息
PARFOR parallelizing compiler is the main part of SAPFOR (System For Automated parallelization). This compiler can be applied as a stand-alone tool to exploit implicit parallelism in traditional sequential languages F...
详细信息
Computing capabilities are continuing to increase with the availability of multi core and many core processors. The wide availability of multi core processors has made parallel programming possible for end user applic...
详细信息
Computing capabilities are continuing to increase with the availability of multi core and many core processors. The wide availability of multi core processors has made parallel programming possible for end user applications running on desktops, workstations, and mobile devices. While parallel hardware has become common, software that exploits parallel capabilities is just beginning to take hold. Multimedia applications, with their data parallel nature and large computing requirements will benefit significantly from parallel programming. In this paper an overview of parallel programming is presented and languages and tools for parallel programming such as OpenMP and CUDA are introduced within the scope of multimedia applications.
Heterogeneous parallel architectures present many challenges to application developers. One of the most important ones is the decision where to execute a specific task. As today's systems are often dynamic in natu...
详细信息
The usage of open-source and easy-to-modify programme code for spatial visualisation of fast-evolving physical phenomena directly in the course of supercomputer calculations becomes relevant in design, research, const...
详细信息
暂无评论