检索结果-内蒙古大学图书馆

Genetic and Evolutionary Computation Conference (GECCO)

作者： Zeng, Peng Lensen, Andrew Sun, Yanan Sichuan Univ Coll Comp Sci Chengdu Peoples R China Victoria Univ Wellington Sch Engn & Comp Sci Wellington New Zealand

ISBN: (纸本)9781450392686

Genetic programming (GP) has been applied to image classification and achieved promising results. However, most GP-based image classification methods are only applied to small-scale image datasets because of the limits of high computation cost. Efficient acceleration technology is needed when extending GP-based image classification methods to large-scale datasets. Considering that fitness evaluation is the most time-consuming phase of the GP evolution process and is a highly parallelized process, this paper proposes a CPU multi-processing and GPU parallel approach to perform the process, and thus effectively accelerate GP for image classification. Through various experiments, the results show that the highly parallelized approach can significantly accelerate GP-based image classification without performance degradation. The training time of GP-based image classification method is reduced from several weeks to tens of hours, enabling it to be run on large-scale image datasets.

关键词： genetic programming image classification parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Consistent Lock-free parallel Stochastic Gradient Descent for Fast and Stable Convergence 35

Consistent Lock-free Parallel Stochastic Gradient Descent fo...

引用

35th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Backstrom, Karl Walulya, Ivan Papatriantafilou, Marina Tsigas, Philippas Chalmers Univ Technol Dept Comp Sci & Engn Gothenburg Sweden

ISBN: (纸本)9781665440660

Stochastic Gradient Descent (SGD) is an essential element in Machine Learning (ML) algorithms. Asynchronous shared-memory parallel SGD (AsyncSGD), including synchronization-free algorithms, e.g. HOGWILD!, have received interest in certain contexts, due to reduced overhead compared to synchronous parallelization. Despite that they induce staleness and inconsistency, they have shown speedup for problems satisfying smooth, strongly convex targets, and gradient sparsity. Recent works take important steps towards understanding the potential of parallel 50D for problems not conforming to these strong assumptions, in particular for deep learning (DL). There is however a gap in current literature in understanding when AsyncSGD algorithms are useful in practice, and in particular how mechanisms for synchronization and consistency play a role. We contribute with answering questions in this gap by studying a spectrum of parallel algorithmic implementations of AsyncSGD, aiming to understand how shared-data synchronization influences the convergence properties in fundamental DL applications. We focus on the impact of consistency-preserving non-blocking synchronization in SGD convergence, and in sensitivity to hyperparameter tuning. We propose Leashed-SGD, an extensible algorithmic framework of consistency-preserving implementations of AsyncSGD, employing lock-free synchronization, effectively balancing throughput and latency. Leashed-SGD features a natural contention-regulating mechanism, as well as dynamic memory management, allocating space only when needed. We argue analytically about the dynamics of the algorithms, memory consumption, the threads' progress over time, and the expected contention. We provide a comprehensive empirical evaluation, validating the analytical claims, benchmarking the proposed Leashed-SGD framework, and comparing to baselines for two prominent deep learning (DL) applications: multilayer perceptrons (MLP) and convolutional neural networks (CNN). We o

关键词： artificial neural networks parallel algorithms lock-free synchronization stochastic gradient descent

来源：评论

学校读者我要写书评

暂无评论

Power Consumption Comparison of GPU Linear Solvers for Cellular Potts Model Simulations

引用

APPLIED SCIENCES-BASEL 2024年第16期14卷 7028页

作者： De Luca, Pasquale Galletti, Ardelio Marcellino, Livia Parthenope Univ Naples UNESCO Chair Environm Resources & Sustainable Dev Dept Sci & Technol Int PhD Programme Ctr Direz Isola C4 I-80143 Naples Italy Parthenope Univ Naples Dept Sci & Technol Ctr Direz Isola C4 I-80143 Naples Italy

Power consumption is a significant challenge in the sustainability of computational science. The growing energy demands of increasingly complex simulations and algorithms lead to substantial resource use, which conflicts with global sustainability goals. This paper investigates the energy efficiency of different parallel implementations of a Cellular Potts model, which models cellular behavior through Hamiltonian energy minimization techniques, leveraging modern GPU architectures. By evaluating alternative solvers, it demonstrates that specific methods can significantly enhance computational efficiency and reduce energy use compared to traditional approaches. The results confirm notable improvements in execution time and energy consumption. In particular, the experiments show a reduction in terms of power of up to 53%, providing a pathway towards more sustainable high-performance computing practices for complex biological simulations.

关键词： cellular potts model parallel algorithms GPU computing energy performance profiling

来源：评论

学校读者我要写书评

暂无评论

CMAP-LAP: Configurable Massively parallel Solver for Lattice Problems 28

CMAP-LAP: Configurable Massively Parallel Solver for Lattice...

引用

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Tateiwa, Nariaki Shinano, Yuji Yamamura, Keiichiro Yoshida, Akihiro Kaji, Shizuo Yasuda, Masaya Fujisawa, Katsuki Kyushu Univ Grad Sch Math Fukuoka Japan Zuse Inst Berlin ZIB Appl Algorithm Intelligence Methods A2IM Berlin Germany Kyushu Univ Inst Math Ind Fukuoka Japan Rikkyo Univ Dept Math Tokyo Japan

ISBN: (纸本)9781665410168

Lattice problems are a class of optimization problems that are notably hard. There are no classical or quantum algorithms known to solve these problems efficiently. Their hardness has made lattices a major cryptographic primitive for post-quantum cryptography. Several different approaches have been used for lattice problems with different computational profiles;some suffer from super-exponential time, and others require exponential space. This motivated us 10 develop a novel lattice problem solver, CMAP-LAP, based on the clever coordination of different algorithms that run massively in parallel. With our flexible framework, heterogeneous modules run asynchronously in parallel on a large-scale distributed system while exchanging information, which drastically boosts the overall performance. We also implement full checkpoint-and-restart functionality, which is vital to high-dimensional lattice problems. CMAP-LAP facilitates the implementation of large-scale parallel strategies for lattice problems since all the functions are designed to he customizable and abstract. Through numerical experiments with up to 103,680 cores, we evaluated the performance and stability of our system and demonstrated its high capability for future massive-scale experiments.

关键词： Discrete optimization Lattice problem Lattice-based cryptography Shortest vector problem parallel algorithms Ubiquity Generator Framework

来源：评论

学校读者我要写书评

暂无评论

A Low Latency parallel Bus Interface for High-Speed multi-FPGA RT-Simulations

A Low Latency Parallel Bus Interface for High-Speed multi-FP...

引用

IEEE Electric Ship Technologies Symposium (ESTS)

作者： Difronzo, Michele Ginn, Herbert L. Benigni, Andrea Univ South Carolina Elect Engn Columbia SC 29208 USA Rhein Westfal TH Aachen Forschungszentrum Julich Aachen Germany

ISBN: (纸本)9781728184265

In this paper we present a low latency interface for high-speed multi-FPGA real time simulation. The interface developed is based on a parallel bus structure and has been implemented using two Virtex Ultrascale-plus devices. The operation of the interface is -at first- evaluated using a linear feedback shift register to compare numerical values exchanged over the bus. We then proceed providing an example of how the interface is used for the simulation of a power electronics system - composed of two dual active bridge converters- using a time step of 70ns. The results of the decoupled simulation are verified against the one of a monolithic solution running on a single FPGA.

关键词： Communication systems Low latency communication Field Programmable Gate Arrays (FPGAs) parallel algorithms Real-Time (RI) systems

来源：评论

学校读者我要写书评

暂无评论

parallel Global Search Algorithm with Local Tuning for Solving Mixed-Integer Global Optimization Problems

引用

LOBACHEVSKII JOURNAL OF MATHEMATICS 2021年第7期42卷 1492-1503页

作者： Barkalov, K. A. Gergel, V. P. Lebedev, I. G. Lobachevskii State Univ Nizhny Novgorod Nizhnii Novgorod 603950 Russia

In this paper, we consider mixed-integer global optimization problems and propose a parallel algorithm for solving problems of this class based on information-statistical approach for solving continuous global optimization problems. Within this algorithm, we suggest using a local tuning scheme based on the assumption that the multiextremality of the discussed problem is weak. We also compare the sequential version of the algorithm with other similar methods. The effectiveness of parallelizing the algorithm has been confirmed by solving a series of mixed-integer global optimization problems on the Lobachevskii supercomputer.

关键词： global optimization non-convex constraints mixed-integer problems local tuning parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree 35

Efficient parallel CP decomposition with pairwise perturbati...

引用

35th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Ma, Linjian Solomonik, Edgar Univ Illinois Dept Comp Sci Champaign IL 61820 USA

ISBN: (纸本)9781665440660

The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across M1TKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N - 1) /N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N - 1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speed-up from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.

关键词： Tensors Program processors Perturbation methods Approximation algorithms Supercomputers Computational efficiency parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

HyPC-Map: A Hybrid parallel Community Detection Algorithm Using Information-Theoretic Approach

HyPC-Map: A Hybrid Parallel Community Detection Algorithm Us...

引用

IEEE High Performance Extreme Computing Conference (HPEC)

作者： Faysal, Md Abdul M. Arifuzzaman, Shaikh Chan, Cy Bremer, Maximilian Popovici, Doru Shalf, John Univ New Orleans New Orleans LA 70148 USA Lawrence Berkeley Natl Lab Berkeley CA USA

ISBN: (纸本)9781665423694

Community detection has become an important graph analysis kernel due to the tremendous growth of social networks and genomics discoveries. Even though there exist a large number of algorithms in the literature, studies show that community detection based on an information-theoretic approach (known as Infomap) delivers better quality solutions than others. Being inherently sequential, the Infomap algorithm does not scale well for large networks. In this work, we develop a hybrid parallel approach for community detection in graphs using Information Theory. We perform extensive benchmarking and analyze hardware parameters to identify and address performance bottlenecks. Additionally, we use cache-optimized data structures to improve cache locality. All of these optimizations lead to an efficient and scalable community detection algorithm, HyPC-Map, which demonstrates a 25-fold speedup (much higher than the state-of-the-art map-based techniques) without sacrificing the quality of the solution.

关键词： Community Detection parallel algorithms Information-Theory Map Equation MDL Graphs

来源：评论

学校读者我要写书评

暂无评论

Achieving Speedups for Distributed Graph Biconnectivity

Achieving Speedups for Distributed Graph Biconnectivity

引用

IEEE High Performance Extreme Computing Virtual Conference (HPEC)

作者： Bogle, Ian Slota, George M. Rensselaer Polytech Inst Dept Comp Sci Troy NY 12181 USA

ISBN: (数字)9781665497862

ISBN: (纸本)9781665497862

As data scales continue to increase, studying the porting and implementation of shared memory parallel algorithms for distributed memory architectures becomes increasingly important. We consider the problem of biconnectivity for this current study, which identifies cut vertices and cut edges in a graph. As part of our study, we implemented and optimized a shared memory biconnectivity algorithm based on color propagation within a distributed memory context. This algorithm is neither work nor time efficient. However, when we compare to distributed implementations of theoretically efficient algorithms, we find that simple non-optimal algorithms can greatly outperform time-efficient algorithms in practice when implemented for real distributed-memory environments and real data. Overall, our distributed implementation for computing graph biconnectivity demonstrates an average strong scaling speedup of 15 x across 64 MPI ranks on a suite of irregular real-world inputs. We also note an average of 11 x and 7.3 x speedup relative to the optimal serial algorithm and fastest shared-memory implementation for the biconnectivity problem, respectively.

关键词： parallel algorithms graph algorithms biconnectivity

来源：评论

学校读者我要写书评

暂无评论

parallel Top-K Motif Discovery in Weighted Networks

SSRN

引用

SSRN 2023年

作者： Papadopoulos, Apostolos N. Koutounidis, Nikolaos Aristotle University of Thessaloniki Greece

The enumeration of all cliques in a graph or finding the largest clique are important problems that unfortunately are computationally intensive. Another alternative is to select only the most important motifs (e.g., small subgraphs, or patterns), where the importance is quantified by means of a function applied on a subgraph. Given a weighted graph G(V,E,w()), where V is the set of nodes and E is the set of edges and w() is a function that returns the weight of an edge e we are looking for the efficient computation of the top-k weighted triangles (and also higher-order cliques, e.g., 4-cliques, 5-cliques, etc). More specifically, the proposed methodology is based on a parallel algorithm which is efficient and scalable and exploits the multi-threading capabilities of modern multi-core processors. Initially, we present a solution for the discovery of top-k triangles, which are the simplest non-trivial cliques and then we generalize our solution for the discovery of top-k c-cliques of higher order, i.e., when c > 3. Performance evaluation results based on real-life networks show that the proposed algorithmic technique is significantly more efficient than the centralized one and also it is scalable, showing very good speedups by increasing the number of CPU cores being used. © 2023, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：