检索结果-内蒙古大学图书馆

SEMIAUTOMATIC TASK GRAPH CONSTRUCTION FOR H-MATRIX ARITHMETIC

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2022年第2期44卷 C77-C98页

作者： Boerm, Steffen Christophersen, Sven Kriemann, Ronald Univ Kiel Dept Math D-24118 Kiel Germany Max Planck Inst Math Sci D-04103 Leipzig Germany

A new method to construct task graphs for 7c-matrix arithmetic is introduced, which uses the information associated with all tasks of the standard recursive 7c-matrix algorithms, e.g., the block index set of the matrix blocks involved in the computation. Task refinement, i.e., the replacement of tasks by subcomputations, is then used to proceed in the 7c-matrix hierarchy until the matrix blocks containing the actual matrix data are reached. This process is a natural extension of the classical, recursive way in which 7c-matrix arithmetic is defined and thereby simplifies the efficient usage of many-core systems. Numerical examples for model problems with different block structures demonstrate the various properties of the new approach.

关键词： hierarchical matrices task graph parallel algorithms many-core processors

来源：评论

学校读者我要写书评

暂无评论

parallel Approximate Undirected Shortest Paths via Low Hop Emulators 2020

Parallel Approximate Undirected Shortest Paths via Low Hop E...

引用

52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC)

作者： Andoni, Alexandr Stein, Clifford Zhong, Peilin Columbia Univ New York NY 10027 USA

ISBN: (纸本)9781450369794

We present a (1 + epsilon) -approximate parallel algorithm for computing shortest paths in undirected graphs, achieving poly(log n) depth and mpoly(log n) work for n-nodes m-edges graphs. Although sequential algorithms with (nearly) optimal running time have been known for several decades, near-optimal parallel algorithms have turned out to be a much tougher challenge. For (1 + epsilon) -approximation, all prior algorithms with poly(log n) depth perform at least Omega(mn(c)) work for some constant c > 0. Improving this long-standing upper bound obtained by Cohen (STOC'94) has been open for 25 years. We develop several new tools of independent interest. One of them is a new notion beyond hopsets - low hop emulator - a poly(log n)-approximate emulator graph in which every shortest path has at most O(log log n) hops (edges). Direct applications of the low hop emulators are parallel algorithms for poly(log n)-approximate single source shortest path (SSSP), Bourgain's embedding, metric tree embedding, and low diameter decomposition, all with poly(log n) depth and mpoly(log n) work. To boost the approximation ratio to (1 + epsilon), we introduce compressible preconditioners and apply it inside Sherman's framework (SODA'17) to solve the more general problem of uncapacitated minimum cost flow (a.k.a., transshipment problem). Our algorithm computes a (1 + epsilon)-approximate uncapacitated minimum cost flow in poly(log n) depth using mpoly(log n) work. As a consequence, it also improves the state-of-the-art sequential running time from m . 2(O(root log n)) to mpoly(log n).

关键词： parallel algorithms shortest paths minimum cost flow low hop emulators

来源：评论

学校读者我要写书评

暂无评论

parallelization of a Meshless Geometric Multigrid Method Based on Domaindecomposition and Coarse Matrix Aggregation Algorithm

SSRN

引用

SSRN 2023年

作者： Ha, Sang Truong Yoon, Han Young Choi, Hyoung Gwon Seoul National University of Science and Technology Korea Republic of Korea Atomic Energy Research Institute 989-111 Daeduk-daero Daejeon34057 Korea Republic of Dept. of Mechanical and Automotive Engineering Seoul National University of Science and Technology Korea Republic of

We investigated three parallel algorithms for a meshless geometric multigrid (GMG)method recently proposed for the linear finite element discretization of elliptic partialdifferential equations. These methods are based on the message passing interface (MPI)for domain decomposition and the coarse matrix aggregation (CMA) algorithm forcoarser levels. We propose a parallel implementation of the Galerkin condition for ameshless GMG and a parameter from which the levels of CMA can be *** parameter is defined as the ratio of the sum of the number of external interfacenodes for all the subdomains to the total number of non-zero entries of an assembledmatrix of a single domain obtained by matrix aggregation on a coarse level. Threemethods (M1, M2, and M3) are classified depending on how the coarsest matrix issolved and how CMA is applied for coarser levels. M1 (M2) solves the coarsest matrixvia an iterative (direct) solver using CMA only for the coarsest level, whereas M3determines the levels with CMA using a parameter proposed in this study and employsa direct solver for the coarsest matrix. We found that M3 is more efficient than M1 andM2 and much more efficient in the case of complicated geometry because CPU timesare significantly reduced compared to other methods at coarser levels. Furthermore,superlinear scalability was achieved owing to the cache effect for a problem size ofmore than 1 million with fewer than 64 processors. © 2023, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Peeling algorithms 14

Parallel Peeling Algorithms

引用

26th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Jiang, Jiayang Mitzenmacher, Michael Thaler, Justin Harvard Univ Sch Engn & Appl Sci Cambridge MA 02138 USA Univ Calif Berkeley Simons Inst Theory Comp Berkeley CA USA

ISBN: (纸本)9781450328210

The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core. In this paper, we analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the k-core is empty with high probability. We show that, with high probability, below this threshold, only 1/log ((k-1)(r-1)) log logn + O(1) rounds of peeling are needed to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show that above this threshold, Omega(logn) rounds of peeling are required to find the non-empty k-core. Since most algorithms and data structures aim to peel to an empty kcore, this asymmetry appears fortunate. We verify the theoretical results both with simulation and with a parallel implementation using graphics processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice.

关键词： parallel algorithms peeling algorithms gpu implementations invertible bloom lookup tables random hypergraphs

来源：评论

学校读者我要写书评

暂无评论

An Efficient and parallel Electromagnetic Solver for Complex Interconnects in Layered Media 29

An Efficient and Parallel Electromagnetic Solver for Complex...

引用

IEEE 29th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)

作者： Marek, Damian Sharma, Shashwat Triverio, Piero Univ Toronto Edward S Rogers Sr Dept Elect & Comp Engn Toronto ON Canada

ISBN: (纸本)9781728161617

A novel parallel solver based on the adaptive integral method (AIM) is proposed for the electromagnetic analysis of electrical interconnects in layered media. We show that graph partitioning techniques can be used to optimally distribute, across thousands of processes, the computations related to both matrix filling and system solution. The proposed workload distribution strategy is compared to existing techniques through a scalability study on a large realistic interposer model in layered media.

关键词： surface integral equation method adaptive integral method parallel algorithms skin effect modeling

来源：评论

学校读者我要写书评

暂无评论

parallel Numerical algorithms for Simulation of Rectangular Waveguides by Using GPU 1

引用

10th International Conference on parallel Processing and Applied Mathematics (PPAM)

作者： Ciegis, Raimondas Bugajev, Andrej Kancleris, Zilvinas Slekas, Gediminas Vilnius Gediminas Tech Univ LT-10223 Vilnius Lithuania

ISBN: (数字)9783642551956

ISBN: (纸本)9783642551956

In this article we consider parallel numerical algorithms to solve the 3D mathematical model, that describes a wave propagation in rectangular waveguide. The main goal is to formulate and analyze a minimal algorithmic template to solve this problem by using the CUDA platform. This template is based on explicit finite difference schemes obtained after approximation of systems of differential equations on the staggered grid. The parallelization of the discrete algorithm is based on the domain decomposition method. The theoretical complexity model is derived and the scalability of the parallel algorithm is investigated. Results of numerical simulations are presented.

关键词： parallel algorithms Numerical simulation Wave propagation GPU CUDA Scalability analysis

来源：评论

学校读者我要写书评

暂无评论

parallel Planar Subgraph Isomorphism and Vertex Connectivity 20

Parallel Planar Subgraph Isomorphism and Vertex Connectivity

引用

32nd ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Gianinazzi, Lukas Hoefler, Torsten Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland

ISBN: (纸本)9781450369350

We present the first parallel fixed-parameter algorithm for subgraph isomorphism in planar graphs, bounded-genus graphs, and, more generally, all minor-closed graphs of locally bounded treewidth. Our randomized low depth algorithm has a near-linear work dependency on the size of the target graph. Existing low depth algorithms do not guarantee that the work remains asymptotically the same for any constant-sized pattern. By using a connection to certain separating cycles, our subgraph isomorphism algorithm can decide the vertex connectivity of a planar graph (with high probability) in asymptotically near-linear work and poly-logarithmic depth. Previously, no sub-quadratic work and poly-logarithmic depth bound was known in planar graphs (in particular for distinguishing between four-connected and five-connected planar graphs).

关键词： graph algorithms parallel algorithms subgraph isomorphism planar graphs vertex connectivity parameterized complexity

来源：评论

学校读者我要写书评

暂无评论

Design of Longitudinal Anti-Disturbance Control System for Aircraft Based on Distributed parallel Algorithm 5

Design of Longitudinal Anti-Disturbance Control System for A...

引用

5th IEEE International Conference on Advanced Robotics and Mechatronics (ICARM)

作者： Lang, Pengfei Liu, Zun Ge, Meng China Acad Launch Vehicle Technol Beijing Peoples R China Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen Peoples R China Beijing Aerosp Inst Metrol & Measurement Technol Beijing Peoples R China

ISBN: (数字)9781728164793

ISBN: (纸本)9781728164793

Aiming at the problem of poor control stability of traditional aircraft control systems, the longitudinal anti-disturbance control system based on the distributed parallel algorithm was designed. Based on the hardware of the original control system, the anti-disturbance control system was designed. And the software part of the aircraft longitudinal anti-disturbance control system was designed. The longitudinal model of aircraft was established, and the active disturbance rejection controller (ADRC) was also designed according to the model. Through the use of distributed parallel algorithms to set the parameters of ADRC, thus completing the design of the vertical ADRC system The comparison experiment with the traditional PD-based aircraft control system shows that the design of the control system based on distributed parallel algorithm has the characteristics of less overshoot, good stability and broad application prospects.

关键词： Control systems Aerospace control Aircraft parallel algorithms Atmospheric modeling Stability analysis Hardware

来源：评论

学校读者我要写书评

暂无评论

An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix Transposition

引用

INFORMATION PROCESSING LETTERS 2022年 173卷

作者： Shubham Prakash, Surya Ganapathi, Pramod Indian Inst Technol Indore Discipline Comp Sci & Engn Indore India SUNY Stony Brook Dept Comp Sci Stony Brook NY 11794 USA

We present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition. Our r-way divide-and-conquer algorithm, for a fixed natural number r >= 2, performs Theta (n(3)) work, achieves Theta (n(logr(2r-1))) span, and incurs O(n(3)/(BM) + (n(2)/B)log root M) serial cache misses for n > gamma M, and incurs O ((n(2)/B)log(n/root M)) serial cache misses for alpha root M < n <= gamma M, where, M is the cache size, B is the cache line size, and alpha and gamma are constants. Published by Elsevier B.V.

关键词： Sequence alignment parallel algorithms Multiway divide-and-conquer Dynamic programming Cache-efficient

来源：评论

学校读者我要写书评

暂无评论

parallel and Scalable Precise Clustering 20

Parallel and Scalable Precise Clustering

引用

ACM International Conference on parallel Architectures and Compilation Techniques (PACT)

作者： Byma, Stuart Dhasade, Akash Altenhoff, Adrian Dessimoz, Christophe Larus, James R. Ecole Polytech Fed Lausanne Lausanne Switzerland IIT Tirupati Tirupati Andhra Pradesh India Swiss Fed Inst Technol Zurich Switzerland Univ Lausanne Lausanne Switzerland

ISBN: (纸本)9781450380751

This paper describes a new technique for parallelizing protein clustering, an important bioinformatics computation for the analysis of protein sequences. Protein clustering identifies groups of proteins that are similar because they share long sequences of similar amino acids. Given a collection of protein sequences, clustering can significantly reduce the computational effort required to identify all similar sequences by avoiding many negative comparisons. The challenge, however, is to build a clustering that misses as few similar sequences (or elements, more generally) as possible. In this paper, we introduce precise clustering, a property that requires each pair of similar elements to appear together in at least one cluster. We show that transitivity in the data can be leveraged to merge clusters while maintaining a precise clustering, providing a basis for independently forming clusters. This allows us reformulate clustering as a bottom-up merge of independent clusters in a new algorithm called ClusterMerge. ClusterMerge exposes parallelism, enabling fast and scalable implementations. We apply ClusterMerge to find similar amino acid sequences in a collection of proteins. ClusterMerge identifies 99.8% of similar pairs found by a full O(n(2)) comparison, with only half as many comparisons. More importantly, ClusterMerge is highly amenable to parallel and distributed computation. Our implementation achieves a speedup of 604 times on 768 cores (1400 times faster than a comparable single-threaded clustering implementation), a strong scaling efficiency of 90%, and a weak scaling efficiency of nearly 100%.

关键词： bioinformatics protein clustering parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：