检索结果-内蒙古大学图书馆

A parallel Algorithm for Solving Linear Parabolic Evolution Equations 1

9th parallel-in-Time Workshop (PinT)

作者： van Venetie, Raymond Westerdiep, Jan Univ Amsterdam Korteweg de Vries KdV Inst Math POB 94248 NL-1090 GE Amsterdam Netherlands

ISBN: (数字)9783030759339

ISBN: (纸本)9783030759339;9783030759322

We present an algorithm for the solution of a simultaneous space-time discretization of linear parabolic evolution equations with a symmetric differential operator in space. Building on earlier work, we recast this discretization into a Schur complement equation whose solution is a quasi-optimal approximation to the weak solution of the equation at hand. Choosing a tensor-product discretization, we arrive at a remarkably simple linear system. Using wavelets in time and standard finite elements in space, we solve the resulting system in linear complexity on a single processor, and in polylogarithmic complexity when parallelized in both space and time. We complement these theoretical findings with large-scale parallel computations showing the effectiveness of the method.

关键词： Parabolic PDEs Space-time variational formulations Optimal preconditioning parallel algorithms Massively parallel computing

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel and fast convergence chaotic Jaya algorithms

引用

SWARM AND EVOLUTIONARY COMPUTATION 2020年 56卷 100698-100698页

作者： Migallon, H. Jimeno-Morenilla, A. Sanchez-Romero, J. L. Belazi, A. Univ Miguel Hernandez Dept Comp Engn E-03202 Alicante Spain Univ Alicante Dept Comp Technol E-03071 Alicante Spain Tunis El Manar Univ Lab RISC ENIT LR 16 ES07 Tunis 1002 Tunisia

The Jaya algorithm is a recent heuristic approach for solving optimisation problems. It involves a random search for the global optimum, based on the generation of new individuals using both the best and the worst individuals in the population, thus moving solutions towards the optimum while avoiding the worst current solution. In addition to its performance in terms of optimisation, a lack of control parameters is another significant advantage of this algorithm. However, the number of iterations needed to reach the optimal solution, or close to it, may be very high, and the computational cost can hamper compliance with time requirements. In this work, a chaotic two-dimensional (2D) map is used to accelerate convergence, and parallel algorithms are developed to alleviate the computational cost. Coarse- and fine-grained parallel algorithms are developed, the former based on multi-populations and the latter at the individual level, and in both cases these are accelerated by an improved (computational) use of the chaos map.

关键词： Optimisation Jaya algorithm Chaotic map parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

Automated Verification of the parallel Bellman-Ford Algorithm 1

引用

28th International Static Analysis Symposium (SAS)

作者： Safari, Mohsen Oortwijn, Wytse Huisman, Marieke Univ Twente Formal Methods & Tools Enschede Netherlands ESI TNO Eindhoven Netherlands

ISBN: (数字)9783030888060

ISBN: (纸本)9783030888060;9783030888053

Many real-world problems such as internet routing are actually graph problems. To develop efficient solutions to such problems, more and more parallel graph algorithms are proposed. This paper discusses the mechanized verification of a commonly used parallel graph algorithm, namely the Bellman-Ford algorithm, which provides an inherently parallel solution to the Single-Source Shortest Path problem. Concretely, we verify an unoptimized GPU version of the Bellman-Ford algorithm, using the VerCors verifier. The main challenge that we had to address was to find suitable global invariants of the graph-based properties for automated verification. This case study is the first deductive verification to prove functional correctness of the parallel Bellman-Ford algorithm. It provides the basis to verify other, optimized implementations of the algorithm. Moreover, it may also provide a good starting point to verify other parallel graph-based algorithms.

关键词： Deductive verification Graph algorithms parallel algorithms GPU Bellman-Ford Case study

来源：评论

学校读者我要写书评

暂无评论

Accelerated Stochastic Gradient for Nonnegative Tensor Completion and parallel Implementation 29

Accelerated Stochastic Gradient for Nonnegative Tensor Compl...

引用

29th European Signal Processing Conference (EUSIPCO)

作者： Siaminou, Ioanna Papagiannakos, Ioannis Marios Kolomvakis, Christos Liavas, Athanasios P. Tech Univ Crete Sch Elect & Comp Engn Khania Greece

ISBN: (纸本)9789082797060

We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient algorithm. We experimentally test the effectiveness and the efficiency of our algorithm using both real-world and synthetic data. We develop a shared-memory implementation of our algorithm using the multi-threaded API OpenMP, which attains significant speedup. We believe that our approach is a very competitive candidate for the solution of very large nonnegative tensor completion problems.

关键词： tensors stochastic gradient nonnegative tensor completion optimal first-order optimization algorithms parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

A Deterministic parallel APSP Algorithm and its Applications 32

A Deterministic Parallel APSP Algorithm and its Applications

引用

32nd Annual ACM-SIAM Symposium on Discrete algorithms (SODA)

作者： Karczmarz, Adam Sankowski, Piotr Univ Warsaw Inst Informat Warsaw Poland

ISBN: (纸本)9781611976465

In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has (O) over tilde (nm + (n/d)(3)) work and (O) over tilde (d) depth for any depth parameter d is an element of [1;n]. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP'17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has (O) over tilde (nm+n(3)/d(2)) work and (O) over tilde (d) depth [Ullman and Yannakakis, SIAM J. Comput. '91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. By suitably adjusting the depth parameter d and applying known techniques, we obtain: (1) nearly work-efficient (O) over tilde (n(1/6))-depth parallel algorithms for the real-weighted single-source shortest paths problem and finding a bipartite perfect matching in a planar graph, (2) an (O) over tilde (n(9/8))-time sequential strongly polynomial algorithm for computing a minimum mean cycle or a minimum cost-to-time-ratio cycle of a planar graph, (3) a slightly faster algorithm for computing so-called external dense distance graphs of all pieces of a recursive decomposition of a planar graph. One notable ingredient of our parallel APSP algorithm is a simple deterministic (O) over tilde (nm)-work (O) over tilde (d)-depth procedure for computing (O) over tilde (n/d)-size hitting sets of shortest d-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called d-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an (O) over tilde (nm)-time deterministic

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Large Scale Image Classification Using GPU-based Genetic Programming 22

Large Scale Image Classification Using GPU-based Genetic Pro...

引用

Genetic and Evolutionary Computation Conference (GECCO)

作者： Zeng, Peng Lensen, Andrew Sun, Yanan Sichuan Univ Coll Comp Sci Chengdu Peoples R China Victoria Univ Wellington Sch Engn & Comp Sci Wellington New Zealand

ISBN: (纸本)9781450392686

Genetic programming (GP) has been applied to image classification and achieved promising results. However, most GP-based image classification methods are only applied to small-scale image datasets because of the limits of high computation cost. Efficient acceleration technology is needed when extending GP-based image classification methods to large-scale datasets. Considering that fitness evaluation is the most time-consuming phase of the GP evolution process and is a highly parallelized process, this paper proposes a CPU multi-processing and GPU parallel approach to perform the process, and thus effectively accelerate GP for image classification. Through various experiments, the results show that the highly parallelized approach can significantly accelerate GP-based image classification without performance degradation. The training time of GP-based image classification method is reduced from several weeks to tens of hours, enabling it to be run on large-scale image datasets.

关键词： genetic programming image classification parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Consistent Lock-free parallel Stochastic Gradient Descent for Fast and Stable Convergence 35

Consistent Lock-free Parallel Stochastic Gradient Descent fo...

引用

35th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Backstrom, Karl Walulya, Ivan Papatriantafilou, Marina Tsigas, Philippas Chalmers Univ Technol Dept Comp Sci & Engn Gothenburg Sweden

ISBN: (纸本)9781665440660

Stochastic Gradient Descent (SGD) is an essential element in Machine Learning (ML) algorithms. Asynchronous shared-memory parallel SGD (AsyncSGD), including synchronization-free algorithms, e.g. HOGWILD!, have received interest in certain contexts, due to reduced overhead compared to synchronous parallelization. Despite that they induce staleness and inconsistency, they have shown speedup for problems satisfying smooth, strongly convex targets, and gradient sparsity. Recent works take important steps towards understanding the potential of parallel 50D for problems not conforming to these strong assumptions, in particular for deep learning (DL). There is however a gap in current literature in understanding when AsyncSGD algorithms are useful in practice, and in particular how mechanisms for synchronization and consistency play a role. We contribute with answering questions in this gap by studying a spectrum of parallel algorithmic implementations of AsyncSGD, aiming to understand how shared-data synchronization influences the convergence properties in fundamental DL applications. We focus on the impact of consistency-preserving non-blocking synchronization in SGD convergence, and in sensitivity to hyperparameter tuning. We propose Leashed-SGD, an extensible algorithmic framework of consistency-preserving implementations of AsyncSGD, employing lock-free synchronization, effectively balancing throughput and latency. Leashed-SGD features a natural contention-regulating mechanism, as well as dynamic memory management, allocating space only when needed. We argue analytically about the dynamics of the algorithms, memory consumption, the threads' progress over time, and the expected contention. We provide a comprehensive empirical evaluation, validating the analytical claims, benchmarking the proposed Leashed-SGD framework, and comparing to baselines for two prominent deep learning (DL) applications: multilayer perceptrons (MLP) and convolutional neural networks (CNN). We o

关键词： artificial neural networks parallel algorithms lock-free synchronization stochastic gradient descent

来源：评论

学校读者我要写书评

暂无评论

Power Consumption Comparison of GPU Linear Solvers for Cellular Potts Model Simulations

引用

APPLIED SCIENCES-BASEL 2024年第16期14卷 7028页

作者： De Luca, Pasquale Galletti, Ardelio Marcellino, Livia Parthenope Univ Naples UNESCO Chair Environm Resources & Sustainable Dev Dept Sci & Technol Int PhD Programme Ctr Direz Isola C4 I-80143 Naples Italy Parthenope Univ Naples Dept Sci & Technol Ctr Direz Isola C4 I-80143 Naples Italy

Power consumption is a significant challenge in the sustainability of computational science. The growing energy demands of increasingly complex simulations and algorithms lead to substantial resource use, which conflicts with global sustainability goals. This paper investigates the energy efficiency of different parallel implementations of a Cellular Potts model, which models cellular behavior through Hamiltonian energy minimization techniques, leveraging modern GPU architectures. By evaluating alternative solvers, it demonstrates that specific methods can significantly enhance computational efficiency and reduce energy use compared to traditional approaches. The results confirm notable improvements in execution time and energy consumption. In particular, the experiments show a reduction in terms of power of up to 53%, providing a pathway towards more sustainable high-performance computing practices for complex biological simulations.

关键词： cellular potts model parallel algorithms GPU computing energy performance profiling

来源：评论

学校读者我要写书评

暂无评论

CMAP-LAP: Configurable Massively parallel Solver for Lattice Problems 28

CMAP-LAP: Configurable Massively Parallel Solver for Lattice...

引用

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Tateiwa, Nariaki Shinano, Yuji Yamamura, Keiichiro Yoshida, Akihiro Kaji, Shizuo Yasuda, Masaya Fujisawa, Katsuki Kyushu Univ Grad Sch Math Fukuoka Japan Zuse Inst Berlin ZIB Appl Algorithm Intelligence Methods A2IM Berlin Germany Kyushu Univ Inst Math Ind Fukuoka Japan Rikkyo Univ Dept Math Tokyo Japan

ISBN: (纸本)9781665410168

Lattice problems are a class of optimization problems that are notably hard. There are no classical or quantum algorithms known to solve these problems efficiently. Their hardness has made lattices a major cryptographic primitive for post-quantum cryptography. Several different approaches have been used for lattice problems with different computational profiles;some suffer from super-exponential time, and others require exponential space. This motivated us 10 develop a novel lattice problem solver, CMAP-LAP, based on the clever coordination of different algorithms that run massively in parallel. With our flexible framework, heterogeneous modules run asynchronously in parallel on a large-scale distributed system while exchanging information, which drastically boosts the overall performance. We also implement full checkpoint-and-restart functionality, which is vital to high-dimensional lattice problems. CMAP-LAP facilitates the implementation of large-scale parallel strategies for lattice problems since all the functions are designed to he customizable and abstract. Through numerical experiments with up to 103,680 cores, we evaluated the performance and stability of our system and demonstrated its high capability for future massive-scale experiments.

关键词： Discrete optimization Lattice problem Lattice-based cryptography Shortest vector problem parallel algorithms Ubiquity Generator Framework

来源：评论

学校读者我要写书评

暂无评论

A Low Latency parallel Bus Interface for High-Speed multi-FPGA RT-Simulations

A Low Latency Parallel Bus Interface for High-Speed multi-FP...

引用

IEEE Electric Ship Technologies Symposium (ESTS)

作者： Difronzo, Michele Ginn, Herbert L. Benigni, Andrea Univ South Carolina Elect Engn Columbia SC 29208 USA Rhein Westfal TH Aachen Forschungszentrum Julich Aachen Germany

ISBN: (纸本)9781728184265

In this paper we present a low latency interface for high-speed multi-FPGA real time simulation. The interface developed is based on a parallel bus structure and has been implemented using two Virtex Ultrascale-plus devices. The operation of the interface is -at first- evaluated using a linear feedback shift register to compare numerical values exchanged over the bus. We then proceed providing an example of how the interface is used for the simulation of a power electronics system - composed of two dual active bridge converters- using a time step of 70ns. The results of the decoupled simulation are verified against the one of a monolithic solution running on a single FPGA.

关键词： Communication systems Low latency communication Field Programmable Gate Arrays (FPGAs) parallel algorithms Real-Time (RI) systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：