The use of Euler-Lagrange methods on unstructured grids extends their application area to more versatile setups. However, the lack of a regular topology limits the scalability of distributed parallel methods, especial...
详细信息
The use of Euler-Lagrange methods on unstructured grids extends their application area to more versatile setups. However, the lack of a regular topology limits the scalability of distributed parallel methods, especially for routines that perform a physical search in space. One of the most prominent slowdowns is the search for halo elements in physical space for the purpose of runtime communication avoidance. In this work, we present a new communication-free halo element search algorithm utilizing the MPI-3 shared memory model. This novel method eliminates the severe performance bottleneck of many-to-many communication during initialization compared to the distributed parallelization approach and extends the possible applications beyond those achievable with the previous approach. Building on these data structures, we then present methods for efficient particle emission, scalable deposition schemes for particle-field coupling, and latency hiding approaches. The scaling performance of the proposed algorithms is validated through plasma dynamics simulations of an open-source framework on a massively parallel system, demonstrating an efficiency of up to 80% on 131072 cores.
Recently, computing Clusters based oil shared-memory multiprocessors (SNIP's) is becoming popular for high performance computing (HPC) applications. With the recent prevalence of CPU's, which are small-scale S...
详细信息
Recently, computing Clusters based oil shared-memory multiprocessors (SNIP's) is becoming popular for high performance computing (HPC) applications. With the recent prevalence of CPU's, which are small-scale SMP's themselves, multi-core CPU's SMP Clusters will become increasingly popular in the near future. SNIP clusters have characteristics of both SMP's and MPP's. Therefore, developing parallel programs which can efficiently exploits characteristics of both SMP and MPP in SMP Clusters is a challenging task. Standard parallelprogramming Models Such as MPI. OpenMP, or hybrid (a combination of the two former models) are commonly used for SNIP Clusters. Depending oil the characteristics of applications, however, some programming models are better than others. To identify and select a Suitable programming model for an application oil SMP Clusters needs a quantity of analysis of the application behavior and its performance. In this paper, We conduct experimental studies to evaluate the benefits and limits of MPI and OpenMP oil three SNIP-based systems using standard HPC applications parallelized using MPI, OpenMP, and hybrid model. The performance results and final analysis may lead to in optimal programming model for the applications.
We discuss new developments of a hybridparallel iterative sparse linear solver framework focused on petroleum reservoir flow and geomechanical simulation. It runs efficiently on several platforms, from desktop workst...
详细信息
We discuss new developments of a hybridparallel iterative sparse linear solver framework focused on petroleum reservoir flow and geomechanical simulation. It runs efficiently on several platforms, from desktop workstations to clusters of multicore nodes, with or without multiple GPUs, using a two-tier hierarchical architecture for distributed matrices and vectors. Results show good parallel scalability. Comparisons with a well-established library and a proprietary commercial solver indicate that our solver is competitive with the best available tools. We present results of the solver?s application to simulations of real and synthetic reservoir models of up to billions of unknowns, running on CPUs and GPUs on up to 2000 processes.
We present a hybrid exact algorithm for the Minimal Hitting Set (MHS) Enumeration Problem for highly heterogeneous CPU-GPU-MIC platforms. With several techniques that permit an efficient exploitation of each architect...
详细信息
We present a hybrid exact algorithm for the Minimal Hitting Set (MHS) Enumeration Problem for highly heterogeneous CPU-GPU-MIC platforms. With several techniques that permit an efficient exploitation of each architecture, low communication cost, and effective load balancing, we were able to enumerate MHSs for large instances in reasonable time, achieving good performance and scalability. We obtained speedups of up to 25.32 in comparison with using two six-core CPUs and we also enumerated MHSs for instances with tens of thousands of variables in less than 5 hours. We also evaluated our algorithm with a real-world driven dataset, and with a large CPU-GPU cluster, we unprecedentedly enumerated in parallel large minimal hitting sets of this dataset in less than 8 hours. These results reinforce the statement that heterogeneous clusters of CPUs, GPUs, and MICs can be used efficiently for high-performance computing.
The maximum flow problem is a classical combinatorial problem with many applications. In this work a hybridparallel algorithm using both multi-core and many-core technologies for computing the maximum flow in a netwo...
详细信息
The maximum flow problem is a classical combinatorial problem with many applications. In this work a hybridparallel algorithm using both multi-core and many-core technologies for computing the maximum flow in a network is presented. The proposed implementation is applicable in OpenMP/CUDA-enabled computing environment. To improve the performance two strategies were implemented: an adaptive approach where the algorithm alternate GPU/CPU processing according to the number of active nodes and implementations of the global relabeling and gap relabeling heuristics on multi-core approach. When compared against the best sequential implementation, the speedups range from 2.36 to 5.38 in several kinds of graph. Results show that the proposed algorithm is faster than previous parallel implementations on CPU/GPUs for all kinds of tested graphs.
暂无评论