Entering the exascale era, graphical processing units (GPUs) have been developed by various vendors for large-scale HPC machines. Nine of the top 10 computers on the Top500 List have heterogeneous architectures contai...
详细信息
ISBN:
(纸本)9798350355543
Entering the exascale era, graphical processing units (GPUs) have been developed by various vendors for large-scale HPC machines. Nine of the top 10 computers on the Top500 List have heterogeneous architectures containing GPUs as accelerators. To take advantage of these computational resources, it is essential to maintain a balance between performance and portability, i.e., making the codes functional, performant, and portable across GPU vendors. In this work, the efforts of restructuring and porting several key CPU kernels in the GAMESS quantum chemistry software package to AMD, Intel, and NVIDIA GPUs via an OpenMP API will be discussed. The use of the OpenMP directive model allows one to successfully run the same code across many GPUs of different vendors. However, since OpenMP is a high-level programming model, it relies on vendor implementations to handle the GPU code generation, and since implementations vary, there are large differences in performance even on the same hardware with different vendor compilers. Presented here are challenges faced during porting CPU code to GPU code with OpenMP, which other porting efforts may also face, such as memory limitations, major restructuring needed for code with branching, and differences in compiler optimizations and tuning. The strategies and approaches to address these challenges are discussed. In this work, the performance results across the supercomputing systems, Summit, Aurora, Frontier, and Perlmutter with a variety of vendor software stacks are presented.
software reuse realizes the sharing of software resources, and component-based reuse is the main form of software reuse. The classification, storage, retrieval, and release of a large number of component resources req...
详细信息
Graph is a general theoretical model in many large scale data-driven applications. SSSP (Single Source Shortest Path) algorithm is a foundation for most important algorithms and applications. GPU remains its mainstrea...
详细信息
In this paper,we consider second order elliptic ODE eigenproblems on general *** construct an efficient algorithm for computing the eigenvalue by using weighted mean combination of the linear finite element method and...
详细信息
ISBN:
(纸本)9781479941681
In this paper,we consider second order elliptic ODE eigenproblems on general *** construct an efficient algorithm for computing the eigenvalue by using weighted mean combination of the linear finite element method and corresponding 2nd-order finite difference *** first take the arithmetic mean of the two *** we compute the quasi-optimal combined parameters for different eigenvalues to improve our efficient *** algorithm we construct convergence faster and have higher accuracy than the linear finite element method and corresponding 2nd-order finite difference *** numerical examples tested on both uniform meshes and nonuniform meshes are given to illustrate the computational cost of different numerical methods for solving eigenvalue *** efficiency,all the matrices use sparse storage in our algorithm.
Modeling of complex physical systems with Modelica usually leads to the high-index differential algebraic equation system(DAE),index reduction is an important part of solving the high-index *** structure index reducti...
详细信息
Modeling of complex physical systems with Modelica usually leads to the high-index differential algebraic equation system(DAE),index reduction is an important part of solving the high-index *** structure index reduction algorithm is one of the popular methods,but in special cases,it *** relaxation algorithm can detect and correct the breakdown *** the maximum weight matching of bipartite graph is an important part of the combinatorial relaxation *** order to choose the proper method for the large-scale,dense bipartite graph,this paper provides three implementations of the Hungarian *** experiment results and the theory show that the BFS single-augmented method is better than others.
Building k-nearest neighbor(kNN)graphs is a necessary step in such areas as data mining and machine *** in this paper,we attempt to study the kNN furthermore,we first propose a parallel algorithm for approximate kNN g...
详细信息
Building k-nearest neighbor(kNN)graphs is a necessary step in such areas as data mining and machine *** in this paper,we attempt to study the kNN furthermore,we first propose a parallel algorithm for approximate kNN graph construction and then apply the kNN graph to the application of *** show that our MPI/OpenMP mixed mode codes can make the construction of approximate kNN graph faster and make the parallelization and implementation ***,we compare the results of agglomerative clustering methods by using our parallel algorithm to illustrate the applicability of this method.
Deep neural networks can be vulnerable to adversarial attacks, even for the mainstream Transformer-based models. Although several robustness enhancement approaches have been proposed, they usually focus on some certai...
详细信息
Graph Neural Network (GNN) has shown great power on many practical tasks in the past few years. It is also considered to be a potential technique in bridging the gap between machine learning and symbolic reasoning. Ex...
详细信息
The paper presents an attempt to bridge the gap between machine learning and symbolic reasoning. We build graph neural networks (GNNs) to predict the solution of the Maximum Satisfiability (MaxSAT) problem, an optimiz...
详细信息
暂无评论