检索结果-内蒙古大学图书馆

Optimizing regular computations based on neural networks and Graph Traversal

Procedia Computer Science 2021年 186卷 337-343页

作者： O.T. Mohammed M.S. Heidari A.A. Paznikov Department of computer science knowledge and discovery Saint Petersburg Electrotechnical University “LETI” ul. Professora Popova 5 St. Petersburg 197376 Russia

In recent days we can see that multicore computers have the ability to easily manipulate digit numbers however as numbers get bigger the computation becomes more complex, the reason is that the size of both CPU registers and buses is limited. As a result, the arithmetic operations such as addition, subtraction, multiplication, and division for CPU become more complex to perform. For solving the problem of how to do computation on big digit numbers, a number of algorithms have been developed. However, the existing algorithms are noticeably slow because they operate on bits individually and are designed to run over single-core computers only. In this paper, an AI model is presented that performs a computation on tokens of 8-digit numbers to assist boost the CPU computation performance.

关键词： Node iteration algorithms Sequential algorithms parallel algorithms Neural networks

来源：评论

学校读者我要写书评

暂无评论

AMYTISS: parallelized Automated Controller Synthesis for Large-Scale Stochastic Systems 32nd

AMYTISS: Parallelized Automated Controller Synthesis for Lar...

引用

32nd International Conference on Computer-Aided Verification (CAV)

作者： Lavaei, Abolfazl Khaled, Mahmoud Soudjani, Sadegh Zamani, Majid Ludwig Maximilians Univ Munchen Dept Comp Sci Munich Germany Tech Univ Munich Dept Elect Engn Munich Germany Newcastle Univ Sch Comp Newcastle Upon Tyne Tyne & Wear England Univ Colorado Dept Comp Sci Boulder CO 80309 USA

ISBN: (纸本)9783030532918;9783030532901

In this paper, we propose a software tool, called AMYTISS, implemented in C++/OpenCL, for designing correct-by-construction controllers for large-scale discrete-time stochastic systems. This tool is employed to (i) build finite Markov decision processes (MDPs) as finite abstractions of given original systems, and (ii) synthesize controllers for the constructed finite MDPs satisfying bounded-time high-level properties including safety, reachability and reach-avoid specifications. In AMYTISS, scalable parallel algorithms are designed such that they support the parallel execution within CPUs, GPUs and hardware accelerators (HWAs). Unlike all existing tools for stochastic systems, AMYTISS can utilize high-performance computing (HPC) platforms and cloud-computing services to mitigate the effects of the state-explosion problem, which is always present in analyzing large-scale stochastic systems. We benchmark AMYTISS against the most recent tools in the literature using several physical case studies including robot examples, room temperature and road traffic networks. We also apply our algorithms to a 3-dimensional autonomous vehicle and 7-dimensional nonlinear model of a BMW 320i car by synthesizing an autonomous parking controller.

关键词： parallel algorithms Finite MDPs Automated controller synthesis Discrete-time stochastic systems High performance computing platform

来源：评论

学校读者我要写书评

暂无评论

Improving Barnes-Hut t-SNE Scalability in GPU with Efficient Memory Access Strategies

Improving Barnes-Hut t-SNE Scalability in GPU with Efficient...

引用

International Joint Conference on Neural Networks (IJCNN) held as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI)

作者： Meyer, Bruno Henrique Ramirez Pozo, Aurora Trinidad Nunan Zola, Wagner M. Univ Fed Parana Dept Informat Curitiba Parana Brazil

ISBN: (纸本)9781728169262

The t-Distributed Stochastic Neighbor Embedding (t-SNE) is a widely used technique for dimensionality reduction, however, its application to large datasets is still an issue. In this sense, BH-tSNE was proposed, which is a successful approximation where the Barnes-Hut algorithm is used instead of computing a step of the t-SNE with quadratic computational time complexity. Even so, this improvement still has limitations to process large data volumes (millions of records). Late studies, such as t-SNE-CUDA, have used GPUs to implement highly parallel BH-tSNE. In this research, a new GPU BH-tSNE implementation is proposed using efficient memory access strategies and recent acceleration techniques. Moreover, the embedding of multidimensional data points into three-dimensional space is applied. We examine scalability issues in one of the most expensive steps of GPU BH-tSNE. Our design allows up to 340% faster execution when compared to the t-SNE-CUDA implementation.

关键词： Dimensionality reduction Big data Visualization of data t-SNE BH-tSNE Barnes-Hut parallel algorithms GPGPU

来源：评论

学校读者我要写书评

暂无评论

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture 6

Introducing multi-level parallelism, at coarse, fine and ins...

引用

IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC / Workshop on Hierarchical parallelism for Exascale Computing (LLVM-HPC and HiPar)

作者： Gratien, Jean-Marc IFP Energies Nouvelles Comp Sci Dept Rueil Malmaison France

ISBN: (纸本)9780738110424

With the evolution of High Performance Computing, multi-core and many-core systems are now a common feature of new hardware architectures. The introduction of very large number of cores at the processor level is challenging because it requires to handle multi level parallelism at various levels either coarse or fine to fully take advantage of the offered computing power. The induced programming effort can be fixed with parallel programming models based on the data flow model and the task programming paradigm [1]. To do so many of the standard numerical algorithms must be revisited as they cannot be easily parallelized at the finest levels. Iterative linear solvers are a key part of petroleum reservoir simulation as they can represent up to 80% of the total computing time. In these algorithms, the standard preconditioning methods for large, sparse and unstructured matrices - such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) - fail to scale on shared-memory architectures with large number of cores. In this paper we reconsider preconditioning algorithms to better introduce multi-level parallelism at both coarse level with MPI, fine level with threads and at the instruction level to enable SIMD optimizations. This paper illustrates how we enhance the implementation of preconditioners like the multi-level domain decomposition (DDML) preconditioners [2], based on the popular Additive Schwartz Method (ASM), or the classical ILU0 preconditioner with the fine grained parallel fixed point variant presented in [3]. Our approach is validated on linear systems extracted from realistic petroleum reservoir simulations. The robustness of the preconditioners is tested with respect to the data heterogeneities of the study cases. We evaluate the extensibility of our implementation regarding the model sizes and its scalability regarding the large number of cores provided by new KNL processors or multi-nodes clusters.

关键词： HPC Task Programming Many-Cores Multi-Level parallelism parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimization of CO2Vibrational Kinetics Modeling in the Full State-to-State Approach

引用

VESTNIK ST PETERSBURG UNIVERSITY-MATHEMATICS 2020年第3期53卷 358-365页

作者： Gorikhovskii, V. I. Nagnibeda, E. A. St Petersburg State Univ St Petersburg 199034 Russia

Numerical modeling of nonequilibrium state-to-state carbon dioxide kinetics is a challenging time-consuming computational task that involves solving a huge system of stiff differential equations and requires optimized methods to solve it. In the present study, we propose and analyse optimizations for the Extended Backward Differential Formula (EBDF) scheme. Using adaptive timesteps instead of fixed ones reduces the number of steps in the algorithm many thousands of times, although with an increase in step complexity. The use of parallel computations to calculate relaxation terms allows one to further reduce the computation time. Numerical experiments on the modeling of spatially homogeneous carbon dioxide vibrational relaxation were performed for optimized computational schemes of different orders. Based on them, the most optimal algorithm of calculations was recommended: a parallel EBDF scheme of fourth-order with an adaptive timestep. This method takes less computational time and memory costs and has the high stability.

关键词： vibrational kinetics carbon dioxide parallel algorithms state-to-state approach optimization of numerical calculations

来源：评论

学校读者我要写书评

暂无评论

parallel cross interpolation for high-precision calculation of high-dimensional integrals

引用

COMPUTER PHYSICS COMMUNICATIONS 2020年 246卷 106869-000页

作者： Dolgov, Sergey Savostyanov, Dmitry Univ Bath Bath BA2 7AY Avon England Univ Brighton Lewes Rd Brighton BN2 4GJ E Sussex England

We propose a parallel version of the cross interpolation algorithm and apply it to calculate high-dimensional integrals motivated by Ising model in quantum physics. In contrast to mainstream approaches, such as Monte Carlo and quasi Monte Carlo, the samples calculated by our algorithm are neither random nor form a regular lattice. Instead we calculate the given function along individual dimensions (modes) and use these values to reconstruct its behaviour in the whole domain. The positions of the calculated univariate fibres are chosen adaptively for the given function. The required evaluations can be executed in parallel along each mode (variable) and over all modes. To demonstrate the efficiency of the proposed method, we apply it to compute high-dimensional Ising susceptibility integrals, arising from asymptotic expansions for the spontaneous magnetisation in two-dimensional Ising model of ferromagnetism. We observe strong superlinear convergence of the proposed method, while the MC and qMC algorithms converge sublinearly. Using multiple precision arithmetic, we also observe exponential convergence of the proposed algorithm. Combining high-order convergence, almost perfect scalability up to hundreds of processes, and the same flexibility as MC and qMC, the proposed algorithm can be a new method of choice for problems involving high-dimensional integration, e.g. in statistics, probability, and quantum physics. (C) 2019 The Authors. Published by Elsevier B.V.

关键词： High-dimensional integration High precision Tensor train format Cross interpolation Ising integrals parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

PIRK: Scalable Interval Reachability Analysis for High-Dimensional Nonlinear Systems 32nd

PIRK: Scalable Interval Reachability Analysis for High-Dimen...

引用

32nd International Conference on Computer-Aided Verification (CAV)

作者： Devonport, Alex Khaled, Mahmoud Arcak, Murat Zamani, Majid Univ Calif Berkeley Berkeley CA 94720 USA Tech Univ Munich Munich Germany Univ Colorado Boulder CO 80309 USA Ludwig Maximilians Univ Munchen Munich Germany

ISBN: (纸本)9783030532888;9783030532871

Reachability analysis is a critical tool for the formal verification of dynamical systems and the synthesis of controllers for them. Due to their computational complexity, many reachability analysis methods are restricted to systems with relatively small dimensions. One significant reason for such limitation is that those approaches, and their implementations, are not designed to leverage parallelism. They use algorithms that are designed to run serially within one compute unit and they can not utilize widely-available high-performance computing (HPC) platforms such as many-core CPUs, GPUs and Cloud-computing services. This paper presents PIRK, a tool to efficiently compute reachable sets for general nonlinear systems of extremely high dimensions. PIRK can utilize HPC platforms for computing reachable sets for general high-dimensional non-linear systems. PIRK has been tested on several systems, with state dimensions up to 4 billion. The scalability of PIRK's parallel implementations is found to be highly favorable.

关键词： Reachability analysis ODE integration Runge-Kutta method Mixed monotonicity Monte Carlo simulation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Asynchronous Stochastic Gradient Descent over Decentralized Datasets 16

Asynchronous Stochastic Gradient Descent over Decentralized ...

引用

16th IEEE International Conference on Control and Automation (ICCA)

作者： Du, Yubo You, Keyou Mo, Yilin Tsinghua Univ Dept Automat Beijing 100084 Peoples R China Tsinghua Univ BNRist Beijing 100084 Peoples R China

ISBN: (数字)9781728190938

ISBN: (纸本)9781728190938

Asynchronous stochastic gradient descent (ASGD) usually works in the centralized setting in which workers retrieve data from a shared training set. This paper focuses on decentralized scenarios where each worker only accesses a subset of the whole training set. We find that due to the heterogeneous properties of the decentralized setting, ASGD will optimize in wrong directions and thus obtain poor solutions. To tackle the issue, a novel algorithm DASGD is proposed for above setting. Our key idea is to form an asymptotically unbiased accurate gradient estimate through reweighting stochastic gradient based on importance sampling technique. Numerical results substantiate the performance of the proposed algorithm in the decentralized setting.

关键词： Training Stochastic processes Probability distribution parallel algorithms Computational modeling Optimization Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Efficient 2D Tensor Network Simulation of Quantum Systems

Efficient 2D Tensor Network Simulation of Quantum Systems

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Pang, Yuchen Hao, Tianyi Dugad, Annika Zhou, Yiqing Solomonik, Edgar Univ Illinois Dept Comp Sci Urbana IL 61801 USA

ISBN: (纸本)9781728199986

Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States (PEPS) are well-suited for key classes of physical systems and quantum circuits. However, direct contraction of PEPS networks has exponential cost, while approximate algorithms require computations with large tensors. We propose new scalable algorithms and software abstractions for PEPS-based methods, accelerating the bottleneck operation of contraction and refactorization of a tensor subnetwork. We employ randomized SVD with an implicit matrix to reduce cost and memory footprint asymptotically. Further, we develop a distributed-memory PEPS library and study accuracy and efficiency of alternative algorithms for PEPS contraction and evolution on the Stampede2 supercomputer. We also simulate a popular near-term quantum algorithm, the Variational Quantum Eigensolver (VQE), and benchmark Imaginar:k Time Evolution (ITE), which compute ground states of Hamiltonians.

关键词： Numerical simulation parallel algorithms Quantum mechanics Quantum computing

来源：评论

学校读者我要写书评

暂无评论

Study of Fine-grained Nested parallelism in CDCL SAT Solvers

引用

ACM TRANSACTIONS ON parallel COMPUTING 2021年第3期8卷 1–18页

作者： Edwards, James Vishkin, Uzi Univ Maryland Inst Adv Comp Studies UMIACS Brendan Iribe Ctr Comp Sci & Engn 8125 Paint Branch Dr College Pk MD 20742 USA

Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382x across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.

关键词： Boolean satisfiability (SAT) solver parallel algorithms nested parallelism

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：