检索结果-内蒙古大学图书馆

27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Haryan, Chirayu Anant Ramakrishna, G. Nasre, Rupesh Reddy, Allam Dinesh Indian Inst Technol Tirupati Dept CSE Tirupati Andhra Pradesh India Indian Inst Technol Madras Dept CSE Madras Tamil Nadu India Zippr Private Ltd Hyderabad India

ISBN: (纸本)9781665422925

Given a temporal graph G, a source vertex s, and a departure time at source vertex ts, the earliest arrival time problem (EAT) is to start from s on or after ts and reach all the vertices in G as early as possible. Ni et al. have proposed a parallel algorithm for EAT and obtained a speedup up to 9.5x on real-world graphs with respect to the connection-scan serial algorithm by using multi-core processors. We propose a topology-driven parallel algorithm for EAT on public transport networks and implement using general-purpose programming on the graphics processing unit (GPU). A temporal connection in a temporal graph for a public transport network is associated with a departure time and a duration time, and many connections exist from u to v for an edge (u, v). We propose two pruning techniques connection-type and clustering, and use arithmetic progression technique appropriately to process many connections of an edge, without scanning all of them. In the connection-type technique, the connections of an edge with the same duration are grouped together. In the clustering technique, we follow 24-hour format and the connections of an edge are partitioned into 24 clusters so that the departure time of connections in the ith cluster is at least i-hour and at most i + 1-hour. The arithmetic progression technique helps to store a sequence of departure times of various connections in a compact way. We propose a hybrid approach to combine the three techniques (connection-type, clustering and arithmetic progression) in an efficient way. Our techniques achieve an average speedup up to 61x when compared to the existing connection-scan serial algorithm running on CPU. Also, the average speedup of our algorithm is 12.65x against the parallel edge-scan-dependency graph algorithm running on GPU.

关键词： Earliest arrival time temporal graphs public transport networks parallel algorithms graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Efficient 2D Tensor Network Simulation of Quantum Systems

Efficient 2D Tensor Network Simulation of Quantum Systems

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Pang, Yuchen Hao, Tianyi Dugad, Annika Zhou, Yiqing Solomonik, Edgar Univ Illinois Dept Comp Sci Urbana IL 61801 USA

ISBN: (纸本)9781728199986

Simulation of quantum systems is challenging due to the exponential size of the state space. Tensor networks provide a systematically improvable approximation for quantum states. 2D tensor networks such as Projected Entangled Pair States (PEPS) are well-suited for key classes of physical systems and quantum circuits. However, direct contraction of PEPS networks has exponential cost, while approximate algorithms require computations with large tensors. We propose new scalable algorithms and software abstractions for PEPS-based methods, accelerating the bottleneck operation of contraction and refactorization of a tensor subnetwork. We employ randomized SVD with an implicit matrix to reduce cost and memory footprint asymptotically. Further, we develop a distributed-memory PEPS library and study accuracy and efficiency of alternative algorithms for PEPS contraction and evolution on the Stampede2 supercomputer. We also simulate a popular near-term quantum algorithm, the Variational Quantum Eigensolver (VQE), and benchmark Imaginar:k Time Evolution (ITE), which compute ground states of Hamiltonians.

关键词： Numerical simulation parallel algorithms Quantum mechanics Quantum computing

来源：评论

学校读者我要写书评

暂无评论

Study of Fine-grained Nested parallelism in CDCL SAT Solvers

引用

ACM TRANSACTIONS ON parallel COMPUTING 2021年第3期8卷 1–18页

作者： Edwards, James Vishkin, Uzi Univ Maryland Inst Adv Comp Studies UMIACS Brendan Iribe Ctr Comp Sci & Engn 8125 Paint Branch Dr College Pk MD 20742 USA

Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382x across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.

关键词： Boolean satisfiability (SAT) solver parallel algorithms nested parallelism

来源：评论

学校读者我要写书评

暂无评论

A GPU-Accelerated Barycentric Lagrange Treecode 34

A GPU-Accelerated Barycentric Lagrange Treecode

引用

34th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Vaughn, Nathan Wilson, Leighton Krasny, Robert Univ Michigan Dept Math Ann Arbor MI 48109 USA

ISBN: (数字)9781728174457

ISBN: (纸本)9781728174457

We present an MPI + OpenACC implementation of the kernel-independent barycentric Lagrange treecode (BLTC) for fast summation of particle interactions on GPUs. The distributed memory parallelization uses recursive coordinate bisection for domain decomposition and MPI remote memory access to build locally essential trees on each rank. The particle interactions are organized into target batch/source cluster interactions which efficiently map onto the GPU;target batching provides an outer level of parallelism, while the direct sum form of the barycentric particle-cluster approximation provides an inner level of parallelism. The GPU-accelerated BLTC performance is demonstrated on several test cases up to 1 billion particles interacting via the Coulomb potential and Yukawa potential.

关键词： Heterogeneous (hybrid) systems Graphics processors Load balancing and task assignment Interpolation Numerical algorithms parallel algorithms Chebyshev approximation and theory Integral Equations

来源：评论

学校读者我要写书评

暂无评论

Accelerated Gaussian Convolution in a Data Assimilation Scenario 20th

Accelerated Gaussian Convolution in a Data Assimilation Scen...

引用

20th Annual International Conference on Computational Science (ICCS)

作者： De Luca, Pasquale Galletti, Ardelio Giunta, Giulio Marcellino, Livia Univ Salerno Dept Comp Sci Fisciano Italy Univ Naples Parthenope Dept Sci & Technol Naples Italy

ISBN: (纸本)9783030504335;9783030504328

Machine Learning algorithms try to provide an adequate forecast for predicting and understanding a multitude of phenomena. However, due to the chaotic nature of real systems, it is very difficult to predict data: a small perturbation from initial state can generate serious errors. Data Assimilation is used to estimate the best initial state of a system in order to predict carefully the future states. Therefore, an accurate and fast Data Assimilation can be considered a fundamental step for the entire Machine Learning process. Here, we deal with the Gaussian convolution operation which is a central step of the Data Assimilation approach and, in general, in several data analysis procedures. In particular, we propose a parallel algorithm, based on the use of Recursive Filters to approximate the Gaussian convolution in a very fast way. Tests and experiments confirm the efficiency of the proposed implementation.

关键词： Gaussian convolution Recursive filters parallel algorithms GPU

来源：评论

学校读者我要写书评

暂无评论

parallel Data Distribution Management on Shared-memory Multiprocessors

引用

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION 2020年第1期30卷 5-5页

作者： Marzolla, Moreno D'angelo, Gabriele Univ Bologna Dept Comp Sci & Engn DISI Mura Anteo Zamboni 7 I-90126 Bologna Italy

The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification a standard framework for interoperability among simulators includes a Data Distribution Management (DDM) service whose responsibility is to report all intersections between a set of subscription and update regions. The algorithms at the core of the DDM service are CPU-intensive, and could greatly benefit from the large computing power of modern multi-core processors. In this article, we propose two parallel solutions to the DDM problem that can operate effectively on shared-memory multiprocessors. The first solution is based on a data structure (the interval tree) that allows concurrent computation of intersections between subscription and update regions. The second solution is based on a novel parallel extension of the Sort Based Matching algorithm, whose sequential version is considered among the most efficient solutions to the DDM problem. Extensive experimental evaluation of the proposed algorithms confirm their effectiveness on taking advantage of multiple execution units in a shared-memory architecture.

关键词： Data distribution management (DDM) parallel and distributed simulation (PADS) high level architecture (HLA) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Provably optimal parallel transport sweeps on semi-structured grids

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2020年第0期407卷 109234-000页

作者： Adams, Michael P. Adams, Marvin L. Hawkins, W. Daryl Smith, Timmie Rauchwerger, Lawrence Amato, Nancy M. Bailey, Teresa S. Falgout, Robert D. Kunen, Adam Brown, Peter Texas A&M Univ Dept Nucl Engn 3133 TAMU College Stn TX 77843 USA Texas A&M Univ Dept Comp Sci & Engn 3112 TAMU College Stn TX 77843 USA Univ Illinois Dept Comp Sci Chicago IL 60680 USA Lawrence Livermore Natl Lab Livermore CA 94550 USA

We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on a class of grids in 2D and 3D Cartesian geometry that are regular at a coarse level but arbitrary within the coarse blocks. We describe these algorithms and show that they always execute the full eight-octant (or four-quadrant if 2D) sweep in the minimum possible number of stages for a given P-x x P-y x P-z, partitioning. Computational results confirm that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. Our PDT transport code has achieved approximately 68% parallel efficiency with > 1.5M parallel threads, relative to 8 threads, on a simple weak-scaling problem with only three energy groups, 10 directions per octant, and 4096 cells/thread. Our ARDRA code has achieved 71% efficiency with > 1.5M cores, relative to 16 cores, with 36 directions per octant and 48 energy groups. We demonstrate similar efficiencies with PDT on a realistic set of nuclear-reactor test problems, with unstructured meshes that resolve fine geometric details. These results demonstrate that discrete-ordinates transport sweeps can be executed with high efficiency using more than 10(6) parallel processes. (C) 2020 Published by Elsevier Inc.

关键词： parallel transport sweeps parallel algorithms STAPL Performance models Unstructured mesh Scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel computation of alpha complexes for biomolecules

引用

COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS 2020年 90卷 101651-101651页

作者： Bin Masood, Talha Ray, Tathagata Natarajan, Vijay Linkoping Univ Sci Visualizat Grp Norrkoping Sweden BITS Pilani Hyderabad Campus Hyderabad India Indian Inst Sci Dept Comp Sci & Automat Bangalore Karnataka India

The alpha complex, a subset of the Delaunay triangulation, has been extensively used as the underlying representation for biomolecular structures. We propose a GPU-based parallel algorithm for the computation of the alpha complex, which exploits the knowledge of typical spatial distribution and sizes of atoms in a biomolecule. Unlike existing methods, this algorithm does not require prior construction of the Delaunay triangulation. The algorithm computes the alpha complex in two stages. The first stage proceeds in a bottom-up fashion and computes a superset of the edges, triangles, and tetrahedra belonging to the alpha complex. The false positives from this estimation stage are removed in a subsequent pruning stage to obtain the correct alpha complex. Computational experiments on several biomolecules demonstrate the superior performance of the algorithm, up to a factor of 50 when compared to existing methods that are optimized for biomolecules. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Delaunay triangulation parallel algorithms Biomolecules GPU

来源：评论

学校读者我要写书评

暂无评论

A fine-grained loop-level parallel approach to efficient fuzzy community detection in complex networks

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2020年第5期32卷 e5537-e5537页

作者： Munoz-Caro, Camelia Nino, Alfonso Reyes, Sebastian Univ Castilla La Mancha Escuela Super Informat Paseo Univ 4 Ciudad Real 13004 Spain

Determining the inner organizational structure of sets of networked elements is of paramount importance to analyze real-world systems such as social, biological, or economic networks. To such an end, it is necessary to identify communities of interrelated nodes within the networks. Recently, a fuzzy community detection approach based on the minimization of a topological error functional has been proposed in the form of a gradient-based algorithm design pattern. However, the intrinsic quadratic algorithmic complexity of the procedure limits the problem size that can be efficiently treated. Here, we extend the ability of this approach to analyze larger networks resorting to parallelism. Thus, we identify the concurrency sources in the gradient-based algorithm design pattern. To determine the parallelization limits, we develop a two-dimensional performance model as a function of the number of processors and network size. The model permits to compute the maximum possible speedup. Another model is presented to find the maximum problem size tractable in a given amount of time. Application of the previous models to a set of benchmark networks shows that parallelization enhances the proposed fuzzy community detection approach in more than an order of magnitude. This allows treatment of networks with several hundred thousand nodes in a time frame of hours.

关键词： complex networks fuzzy communities machine learning parallel algorithms performance model

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of a Multicore Implementation for Solving a Two-Dimensional Inverse Anomalous Diffusion Problem 3rd

Performance Analysis of a Multicore Implementation for Solvi...

引用

3rd International Conference on Numerical Computations - Theory and algorithms (NUMTA)

作者： De Luca, Pasquale Galletti, Ardelio Giunta, Giulio Marcellino, Livia Raei, Marzie Univ Salerno Dept Comp Sci Fisciano Italy Univ Naples Parthenope Dept Sci & Technol Naples Italy Malek Ashtar Univ Technol Dept Math Esfahan Iran

ISBN: (纸本)9783030390815

In this work we deal with the solution of a two-dimensional inverse time fractional diffusion equation, involving a Caputo fractional derivative in his expression. Since we deal with a huge practical problem with a large domain, by starting from an accurate meshless localized collocation method using RBFs, here we propose a fast algorithm, implemented in a multicore architecture, which exploits suitable parallel computational kernels. More in detail, we firstly developed, a C code based on the numerical library LAPACK to perform the basic linear algebra operations and to solve linear systems, then, due to the high computational complexity and the large size of the problem, we propose a parallel algorithm specifically designed for multicore architectures and based on the Pthreads library. Performance analysis will show accuracy and reliability of our parallel implementation.

关键词： Fractional models Multicore architecture parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：