In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU regist...
详细信息
In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU registers and buses is limited. As a result, the arithmetic operations such as addition, subtraction, multiplication, and division for CPU become more complex to perform. For solving the problem of how to do computation on big digit numbers, a number of algorithms have been developed. However, the existing algorithms are noticeably slow because they operate on bits individually and are designed to run over single-core computers only. In this paper, an AI model is presented that performs a computation on tokens of 8-digit numbers to assist boost the CPU computation performance.
In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) bu...
详细信息
ISBN:
(纸本)9783030532918;9783030532901
In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) build finite Markov decision processes (MDPs) as finite abstractions of given original systems, and (ii) synthesize controllers for the constructed finite MDPs satisfying bounded-time high-level properties including safety, reachability and reach-avoid specifications. In AMYTISS, scalable parallel algorithms are designed such that they support the parallel execution within CPUs, GPUs and hardware accelerators (HWAs). Unlike all existing tools for stochastic systems, AMYTISS can utilize high-performance computing (HPC) platforms and cloud-computing services to mitigate the effects of the state-explosion problem, which is always present in analyzing large-scale stochastic systems. We benchmark AMYTISS against the most recent tools in the literature using several physical case studies including robot examples, room temperature and road traffic networks. We also apply our algorithms to a 3-dimensional autonomous vehicle and 7-dimensional nonlinear model of a BMW 320i car by synthesizing an autonomous parking controller.
The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a widely used technique for dimensionality reduction, however, its application to large datasets is still an issue. In this sense, BH-tSNE was proposed, which...
详细信息
ISBN:
(纸本)9781728169262
The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a widely used technique for dimensionality reduction, however, its application to large datasets is still an issue. In this sense, BH-tSNE was proposed, which is a successful approximation where the Barnes-Hut algorithm is used instead of computing a step of the t-SNE with quadratic computational time complexity. Even so, this improvement still has limitations to process large data volumes (millions of records). Late studies, such as t-SNE-CUDA, have used GPUs to implement highly parallel BH-tSNE. In this research, a new GPU BH-tSNE implementation is proposed using efficient memory access strategies and recent acceleration techniques. Moreover, the embedding of multidimensional data points into three-dimensional space is applied. We examine scalability issues in one of the most expensive steps of GPU BH-tSNE. Our design allows up to 340% faster execution when compared to the t-SNE-CUDA implementation.
With the evolution of High Performance Computing, multi-core and many-core systems are now a common feature of new hardware architectures. The introduction of very large number of cores at the processor level is chall...
详细信息
ISBN:
(纸本)9780738110424
With the evolution of High Performance Computing, multi-core and many-core systems are now a common feature of new hardware architectures. The introduction of very large number of cores at the processor level is challenging because it requires to handle multi level parallelism at various levels either coarse or fine to fully take advantage of the offered computing power. The induced programming effort can be fixed with parallel programming models based on the data flow model and the task programming paradigm [1]. To do so many of the standard numerical algorithms must be revisited as they cannot be easily parallelized at the finest levels. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices - such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) - fail to scale on shared-memory architectures with large number of cores. In this paper we reconsider preconditioning algorithms to better introduce multi-level parallelism at both coarse level with MPI, fine level with threads and at the instruction level to enable SIMD optimizations. This paper illustrates how we enhance the implementation of preconditioners like the multi-level domain decomposition (DDML) preconditioners [2], based on the popular Additive Schwartz Method (ASM), or the classical ILU0 preconditioner with the fine grained parallel fixed point variant presented in [3]. Our approach is validated on linear systems extracted from realistic petroleum reservoir simulations. The robustness of the preconditioners is tested with respect to the data heterogeneities of the study cases. We evaluate the extensibility of our implementation regarding the model sizes and its scalability regarding the large number of cores provided by new KNL processors or multi-nodes clusters.
Numerical modeling of nonequilibrium state-to-state carbon dioxide kinetics is a challenging time-consuming computational task that involves solving a huge system of stiff differential equations and requires optimized...
详细信息
Numerical modeling of nonequilibrium state-to-state carbon dioxide kinetics is a challenging time-consuming computational task that involves solving a huge system of stiff differential equations and requires optimized methods to solve it. In the present study, we propose and analyse optimizations for the Extended Backward Differential Formula (EBDF) scheme. Using adaptive timesteps instead of fixed ones reduces the number of steps in the algorithm many thousands of times, although with an increase in step complexity. The use of parallel computations to calculate relaxation terms allows one to further reduce the computation time. Numerical experiments on the modeling of spatially homogeneous carbon dioxide vibrational relaxation were performed for optimized computational schemes of different orders. Based on them, the most optimal algorithm of calculations was recommended: a parallel EBDF scheme of fourth-order with an adaptive timestep. This method takes less computational time and memory costs and has the high stability.
We propose a parallel version of the cross interpolation algorithm and apply it to calculate high-dimensional integrals motivated by Ising model in quantum physics. In contrast to mainstream approaches, such as Monte ...
详细信息
We propose a parallel version of the cross interpolation algorithm and apply it to calculate high-dimensional integrals motivated by Ising model in quantum physics. In contrast to mainstream approaches, such as Monte Carlo and quasi Monte Carlo, the samples calculated by our algorithm are neither random nor form a regular lattice. Instead we calculate the given function along individual dimensions (modes) and use these values to reconstruct its behaviour in the whole domain. The positions of the calculated univariate fibres are chosen adaptively for the given function. The required evaluations can be executed in parallel along each mode (variable) and over all modes. To demonstrate the efficiency of the proposed method, we apply it to compute high-dimensional Ising susceptibility integrals, arising from asymptotic expansions for the spontaneous magnetisation in two-dimensional Ising model of ferromagnetism. We observe strong superlinear convergence of the proposed method, while the MC and qMC algorithms converge sublinearly. Using multiple precision arithmetic, we also observe exponential convergence of the proposed algorithm. Combining high-order convergence, almost perfect scalability up to hundreds of processes, and the same flexibility as MC and qMC, the proposed algorithm can be a new method of choice for problems involving high-dimensional integration, e.g. in statistics, probability, and quantum physics. (C) 2019 The Authors. Published by Elsevier B.V.
Reachability analysis is a critical tool for the formal verification of dynamical systems and the synthesis of controllers for them. Due to their computational complexity, many reachability analysis methods are restri...
详细信息
ISBN:
(纸本)9783030532888;9783030532871
Reachability analysis is a critical tool for the formal verification of dynamical systems and the synthesis of controllers for them. Due to their computational complexity, many reachability analysis methods are restricted to systems with relatively small dimensions. One significant reason for such limitation is that those approaches, and their implementations, are not designed to leverage parallelism. They use algorithms that are designed to run serially within one compute unit and they can not utilize widely-available high-performance computing (HPC) platforms such as many-core CPUs, GPUs and Cloud-computing services. This paper presents PIRK, a tool to efficiently compute reachable sets for general nonlinear systems of extremely high dimensions. PIRK can utilize HPC platforms for computing reachable sets for general high-dimensional non-linear systems. PIRK has been tested on several systems, with state dimensions up to 4 billion. The scalability of PIRK's parallel implementations is found to be highly favorable.
Asynchronous stochastic gradient descent (ASGD) usually works in the centralized setting in which workers retrieve data from a shared training set. This paper focuses on decentralized scenarios where each worker only ...
详细信息
ISBN:
(数字)9781728190938
ISBN:
(纸本)9781728190938
Asynchronous stochastic gradient descent (ASGD) usually works in the centralized setting in which workers retrieve data from a shared training set. This paper focuses on decentralized scenarios where each worker only accesses a subset of the whole training set. We find that due to the heterogeneous properties of the decentralized setting, ASGD will optimize in wrong directions and thus obtain poor solutions. To tackle the issue, a novel algorithm DASGD is proposed for above setting. Our key idea is to form an asymptotically unbiased accurate gradient estimate through reweighting stochastic gradient based on importance sampling technique. Numerical results substantiate the performance of the proposed algorithm in the decentralized setting.
Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected E...
详细信息
ISBN:
(纸本)9781728199986
Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States (PEPS) are well-suited for key classes of physical systems and quantum circuits. However, direct contraction of PEPS networks has exponential cost, while approximate algorithms require computations with large tensors. We propose new scalable algorithms and software abstractions for PEPS-based methods, accelerating the bottleneck operation of contraction and refactorization of a tensor subnetwork. We employ randomized SVD with an implicit matrix to reduce cost and memory footprint asymptotically. Further, we develop a distributed-memory PEPS library and study accuracy and efficiency of alternative algorithms for PEPS contraction and evolution on the Stampede2 supercomputer. We also simulate a popular near-term quantum algorithm, the Variational Quantum Eigensolver (VQE), and benchmark Imaginar:k Time Evolution (ITE), which compute ground states of Hamiltonians.
Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, pa...
详细信息
Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382x across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.
暂无评论