Graph Analytics is important in different domains: social networks, computer networks, and computational biology to name a few. This paper describes the challenges involved in programming the underlying graph algorith...
详细信息
ISBN:
(纸本)9783030369873;9783030369866
Graph Analytics is important in different domains: social networks, computer networks, and computational biology to name a few. This paper describes the challenges involved in programming the underlying graph algorithms for graph analytics for distributed systems with CPU, GPU, and multi-GPU machines and how to deal with them. It emphasizes how language abstractions and good compilation can ease programming graph analytics on such platforms without sacrificing implementation efficiency.
Electronic structure calculations based on density-functional theory (DFT) represent a significant part of today's HPC workloads and pose high demands on high-performance computing resources. To perform these quan...
详细信息
ISBN:
(纸本)9781728199986
Electronic structure calculations based on density-functional theory (DFT) represent a significant part of today's HPC workloads and pose high demands on high-performance computing resources. To perform these quantum-mechanical DFT calculations on complex large-scale systems, so-called linear scaling methods instead of conventional cubic scaling methods are required. In this work, we take up the idea of the submatrix method and apply it to the DFT computations in the software package CP2K. For that purpose, we transform the underlying numeric operations on distributed, large, sparse matrices into computations on local, much smaller and nearly dense matrices. This allows us to exploit the full floating-point performance of modern CPUs and to make use of dedicated accelerator hardware, where performance has been limited by memory bandwidth before. We demonstrate both functionality and performance of our implementation and show how it can he accelerated with GPM and FPGAs.
In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU regist...
详细信息
In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU registers and buses is limited. As a result, the arithmetic operations such as addition, subtraction, multiplication, and division for CPU become more complex to perform. For solving the problem of how to do computation on big digit numbers, a number of algorithms have been developed. However, the existing algorithms are noticeably slow because they operate on bits individually and are designed to run over single-core computers only. In this paper, an AI model is presented that performs a computation on tokens of 8-digit numbers to assist boost the CPU computation performance.
In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) bu...
详细信息
ISBN:
(纸本)9783030532918;9783030532901
In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) build finite Markov decision processes (MDPs) as finite abstractions of given original systems, and (ii) synthesize controllers for the constructed finite MDPs satisfying bounded-time high-level properties including safety, reachability and reach-avoid specifications. In AMYTISS, scalable parallel algorithms are designed such that they support the parallel execution within CPUs, GPUs and hardware accelerators (HWAs). Unlike all existing tools for stochastic systems, AMYTISS can utilize high-performance computing (HPC) platforms and cloud-computing services to mitigate the effects of the state-explosion problem, which is always present in analyzing large-scale stochastic systems. We benchmark AMYTISS against the most recent tools in the literature using several physical case studies including robot examples, room temperature and road traffic networks. We also apply our algorithms to a 3-dimensional autonomous vehicle and 7-dimensional nonlinear model of a BMW 320i car by synthesizing an autonomous parking controller.
The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a widely used technique for dimensionality reduction, however, its application to large datasets is still an issue. In this sense, BH-tSNE was proposed, which...
详细信息
ISBN:
(纸本)9781728169262
The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a widely used technique for dimensionality reduction, however, its application to large datasets is still an issue. In this sense, BH-tSNE was proposed, which is a successful approximation where the Barnes-Hut algorithm is used instead of computing a step of the t-SNE with quadratic computational time complexity. Even so, this improvement still has limitations to process large data volumes (millions of records). Late studies, such as t-SNE-CUDA, have used GPUs to implement highly parallel BH-tSNE. In this research, a new GPU BH-tSNE implementation is proposed using efficient memory access strategies and recent acceleration techniques. Moreover, the embedding of multidimensional data points into three-dimensional space is applied. We examine scalability issues in one of the most expensive steps of GPU BH-tSNE. Our design allows up to 340% faster execution when compared to the t-SNE-CUDA implementation.
With the evolution of High Performance Computing, multi-core and many-core systems are now a common feature of new hardware architectures. The introduction of very large number of cores at the processor level is chall...
详细信息
ISBN:
(纸本)9780738110424
With the evolution of High Performance Computing, multi-core and many-core systems are now a common feature of new hardware architectures. The introduction of very large number of cores at the processor level is challenging because it requires to handle multi level parallelism at various levels either coarse or fine to fully take advantage of the offered computing power. The induced programming effort can be fixed with parallel programming models based on the data flow model and the task programming paradigm [1]. To do so many of the standard numerical algorithms must be revisited as they cannot be easily parallelized at the finest levels. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices - such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) - fail to scale on shared-memory architectures with large number of cores. In this paper we reconsider preconditioning algorithms to better introduce multi-level parallelism at both coarse level with MPI, fine level with threads and at the instruction level to enable SIMD optimizations. This paper illustrates how we enhance the implementation of preconditioners like the multi-level domain decomposition (DDML) preconditioners [2], based on the popular Additive Schwartz Method (ASM), or the classical ILU0 preconditioner with the fine grained parallel fixed point variant presented in [3]. Our approach is validated on linear systems extracted from realistic petroleum reservoir simulations. The robustness of the preconditioners is tested with respect to the data heterogeneities of the study cases. We evaluate the extensibility of our implementation regarding the model sizes and its scalability regarding the large number of cores provided by new KNL processors or multi-nodes clusters.
Numerical modeling of nonequilibrium state-to-state carbon dioxide kinetics is a challenging time-consuming computational task that involves solving a huge system of stiff differential equations and requires optimized...
详细信息
Numerical modeling of nonequilibrium state-to-state carbon dioxide kinetics is a challenging time-consuming computational task that involves solving a huge system of stiff differential equations and requires optimized methods to solve it. In the present study, we propose and analyse optimizations for the Extended Backward Differential Formula (EBDF) scheme. Using adaptive timesteps instead of fixed ones reduces the number of steps in the algorithm many thousands of times, although with an increase in step complexity. The use of parallel computations to calculate relaxation terms allows one to further reduce the computation time. Numerical experiments on the modeling of spatially homogeneous carbon dioxide vibrational relaxation were performed for optimized computational schemes of different orders. Based on them, the most optimal algorithm of calculations was recommended: a parallel EBDF scheme of fourth-order with an adaptive timestep. This method takes less computational time and memory costs and has the high stability.
We propose a parallel version of the cross interpolation algorithm and apply it to calculate high-dimensional integrals motivated by Ising model in quantum physics. In contrast to mainstream approaches, such as Monte ...
详细信息
We propose a parallel version of the cross interpolation algorithm and apply it to calculate high-dimensional integrals motivated by Ising model in quantum physics. In contrast to mainstream approaches, such as Monte Carlo and quasi Monte Carlo, the samples calculated by our algorithm are neither random nor form a regular lattice. Instead we calculate the given function along individual dimensions (modes) and use these values to reconstruct its behaviour in the whole domain. The positions of the calculated univariate fibres are chosen adaptively for the given function. The required evaluations can be executed in parallel along each mode (variable) and over all modes. To demonstrate the efficiency of the proposed method, we apply it to compute high-dimensional Ising susceptibility integrals, arising from asymptotic expansions for the spontaneous magnetisation in two-dimensional Ising model of ferromagnetism. We observe strong superlinear convergence of the proposed method, while the MC and qMC algorithms converge sublinearly. Using multiple precision arithmetic, we also observe exponential convergence of the proposed algorithm. Combining high-order convergence, almost perfect scalability up to hundreds of processes, and the same flexibility as MC and qMC, the proposed algorithm can be a new method of choice for problems involving high-dimensional integration, e.g. in statistics, probability, and quantum physics. (C) 2019 The Authors. Published by Elsevier B.V.
Asynchronous stochastic gradient descent (ASGD) usually works in the centralized setting in which workers retrieve data from a shared training set. This paper focuses on decentralized scenarios where each worker only ...
详细信息
ISBN:
(数字)9781728190938
ISBN:
(纸本)9781728190938
Asynchronous stochastic gradient descent (ASGD) usually works in the centralized setting in which workers retrieve data from a shared training set. This paper focuses on decentralized scenarios where each worker only accesses a subset of the whole training set. We find that due to the heterogeneous properties of the decentralized setting, ASGD will optimize in wrong directions and thus obtain poor solutions. To tackle the issue, a novel algorithm DASGD is proposed for above setting. Our key idea is to form an asymptotically unbiased accurate gradient estimate through reweighting stochastic gradient based on importance sampling technique. Numerical results substantiate the performance of the proposed algorithm in the decentralized setting.
Reachability analysis is a critical tool for the formal verification of dynamical systems and the synthesis of controllers for them. Due to their computational complexity, many reachability analysis methods are restri...
详细信息
ISBN:
(纸本)9783030532888;9783030532871
Reachability analysis is a critical tool for the formal verification of dynamical systems and the synthesis of controllers for them. Due to their computational complexity, many reachability analysis methods are restricted to systems with relatively small dimensions. One significant reason for such limitation is that those approaches, and their implementations, are not designed to leverage parallelism. They use algorithms that are designed to run serially within one compute unit and they can not utilize widely-available high-performance computing (HPC) platforms such as many-core CPUs, GPUs and Cloud-computing services. This paper presents PIRK, a tool to efficiently compute reachable sets for general nonlinear systems of extremely high dimensions. PIRK can utilize HPC platforms for computing reachable sets for general high-dimensional non-linear systems. PIRK has been tested on several systems, with state dimensions up to 4 billion. The scalability of PIRK's parallel implementations is found to be highly favorable.
暂无评论