检索结果-内蒙古大学图书馆

High-performance watershed delineation algorithm for GPU using CUDA and OpenMP

ENVIRONMENTAL MODELLING & SOFTWARE 2023年 160卷

作者： Kotyra, Bartlomiej Marie Curie Sklodowska Univ Inst Comp Sci Ul Akad 9 PL-20033 Lublin Poland

Watershed delineation is one of the fundamental tasks in hydrological studies. Tools for extracting watersheds from digital elevation models and flow direction rasters are commonly implemented in GIS software packages. However, the performance of available techniques and algorithms often turns out to be far from sufficient, especially when working with large datasets. While modern hardware offers high computing performance through massive parallelism, there is still a need for algorithms that can effectively use these capabilities. This paper proposes an algorithm for rapid watershed delineation directly from flow direction rasters, using the possibilities offered by modern GPU devices. Performance measurements show a significant reduction in execution time compared to other parallel solutions proposed for this task in the literature. Moreover, this implementation makes it possible to delineate multiple watersheds from the same dataset simultaneously, each having one or more outlet cells, with virtually no additional computational cost.

关键词： Watershed delineation GIS parallel algorithms GPU CUDA OpenMP

来源：评论

学校读者我要写书评

暂无评论

VaLiPro: Linear Programming Validator for Cluster Computing Systems

Supercomputing Frontiers and Innovations

引用

Supercomputing Frontiers and Innovations 2021年第3期8卷 51-61页

作者： Sokolinsky, Leonid B. Sokolinskaya, Irina M. Chelyabinsk Russia

The article presents and evaluates a scalable algorithm for validating solutions to linear programming problems on cluster computing systems. The main idea of the method is to generate a regular set of points (validation set) on a small-radius hypersphere centered at the solution point submitted to validation. The objective function is computed at each point of the validation that belongs to the feasible region. If all the values are less than or equal to the value of the objective function at the point that is to be validated, then this point is the correct solution. The parallel implementation of the VaLiPro algorithm is written in C++ through the parallel BSF-skeleton, which encapsulates all aspects related to the MPI-based parallelization of the program. We provide the results of large-scale computational experiments on a cluster computing system to study the scalability of the VaLiPro algorithm. © 2021 The Authors. This paper is published with open access at ***. All Rights Reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Problem Specific Communications in Distributed Quantum Computing

Problem Specific Communications in Distributed Quantum Compu...

引用

Quantum Communications, Networking, and Computing (QCNC), International Conference on

作者： Joyanta Basak Bing Wang Sanguthevar Rajasekaran School of Computing University of Connecticut USA

ISBN: (数字)9798331531591

ISBN: (纸本)9798331531607

When all the qubits needed for solving a problem are not located in a single quantum computer, qubits from different quantum computers can be collectively utilized. In this case, quantum communication is needed for the multiple quantum computers to communicate with each other. Several studies address the problem of minimizing the number of quantum communications when evaluating a general quantum circuit. The solutions proposed typically involve solving some intractable problems. In this paper, we show that we can obtain much better solutions when we focus on solving specific problems (instead of seeking solutions for generic circuits). Specifically, we consider several fundamental quantum circuits and identify communication protocols that need a much smaller number of communication steps than those offered by generic solutions. Our work is in line with traditional parallel and distributed computing research where typically scientists focus on solving specific problems (such as sorting, matrix multiplication, network flow, etc.) in a parallel or distributed setting.

关键词： Quantum computing Protocols Source coding Qubit Search problems Complexity theory Quantum networks Quantum circuit parallel algorithms Sorting

来源：评论

学校读者我要写书评

暂无评论

Hybrid Quantum algorithms for N-Body Simulations

Hybrid Quantum Algorithms for N-Body Simulations

引用

Quantum Communications, Networking, and Computing (QCNC), International Conference on

作者： S. Rajasekaran P. Agrawal R. Lagasse B. Chaudhuri S. Najafian School of Computing Univ. of Connecticut School of Pharmacy Univ. of Connecticut

ISBN: (数字)9798331531591

ISBN: (纸本)9798331531607

N-body simulations is a fundamental problem in molecular dynamics. For example, N-body simulations can be employed to study powder flow. It is worth noting that powder or granular materials are the second ubiquitous substance in the industry after water. Numerous sequential and parallel algorithms have been proposed in the literature for N-body simulations on classical computers. Given a set of n particles in a d-dimensional space, a brute force algorithm will take Ω(n 2 d) time to simulate each step of this system of particles. Numerous algorithms in the literature perform better than this under some suitable assumptions. Given the potential speedups offered by quantum computing, an interesting open problem is to investigate how much speedups can be obtained for N-body simulations using quantum computers. In this paper we present efficient algorithms that run on quantum-classical hybrid models of computing. Specifically, our algorithms solve the problem of finding close neighbors using a quantum computer and the other steps on a parallel Random Access Machine (PRAM). Our quantum algorithms outperform other algorithms in the literature asymptotically as well as on empirical evaluations.

关键词： Computers Industries Quantum algorithm Powders Computational modeling Force Phase change random access memory parallel algorithms Quantum communication

来源：评论

学校读者我要写书评

暂无评论

FPGAs in Reduct Calculation Using Rough Sets

FPGAs in Reduct Calculation Using Rough Sets

引用

20th International Conference on Dependability of Computer Systems, DepCoS-RELCOMEX 2025

作者： Kopczynski, Maciej Faculty of Computer Science Bialystok University of Technology Wiejska 45A Bialystok15-351 Poland

ISBN: (纸本)9783031927331

As data volume grows, computational speed becomes a key challenge. Data reduction helps address this by eliminating redundancy in rough sets using a reduct. However, most reduct-generation algorithms rely on software, which suffers from limitations like fixed word length and execution delays due to instruction processing, making them relatively slow. This paper proposes a hardware implementation of a two-stage greedy algorithm for reduct computation. The first stage identifies the core via a discernibility matrix, while the second enriches it with essential attributes. Presented algorithms were implemented on both Altera and Xilinx Field Programmable Gate Array (FPGA) units for high-speed, parallel processing and compared to a C-based software implementation on a PC. Results show a significant improvement in processing speed using the hardware approach. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

The Research of parallel Wavelet Transform algorithms on Remote Sensing Image

The Research of Parallel Wavelet Transform Algorithms on Rem...

引用

International Conference on Electromechanical Control Technology and Transportation (ICECTT)

作者： Shengzhong Zhang Information Network Center China University of Geosciences (Beijing) Beijing China

ISBN: (数字)9781728199283

ISBN: (纸本)9781728199290

Focusing on the Wavelet Transform, the paper explores four parallel Wavelet Transform algorithms and techniques from the perspectives of data parallel and algorithm parallel for remote sensing images. Among them, the algorithm based on "Working Pool parallel" achieves dynamic load balance without any limits to the scale of the data and the number of the Slaves. Therefore, this algorithm is easier to achieve the goal of processing the vast data of remote sensing images rapidly in the distributed network systems.

关键词： Wavelet transforms Filtering Task analysis Heuristic algorithms Recycling Wavelet analysis parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel clique counting and peeling algorithms

arXiv

引用

arXiv 2020年

作者： Shi, Jessica Dhulipala, Laxman Shun, Julian MIT CSAIL CambridgeMA United States

We present a new parallel algorithm for k-clique counting/listing that has polylogarithmic span (parallel time) and is work-efficient (matches the work of the best sequential algorithm) for sparse graphs. Our algorithm is based on computing low out-degree orientations, which we present new linear-work and polylogarithmic-span algorithms for computing in parallel. We also present new parallel algorithms for producing unbiased estimations of clique counts using graph sparsification. Finally, we design two new parallel work-efficient algorithms for approximating the k-clique densest subgraph, the first of which is a 1/k-approximation and the second of which is a 1/(k(1 + ∊))-approximation and has polylogarithmic span. Our first algorithm does not have polylogarithmic span, but we prove that it solves a P-complete problem. In addition to the theoretical results, we also implement the algorithms and propose various optimizations to improve their practical performance. On a 30-core machine with two-way hyper-threading, our algorithms achieve 13.23-38.99x and 1.19-13.76x self-relative parallel speedup for k-clique counting and k-clique densest subgraph, respectively. Compared to the state-of-the-art parallel k-clique counting algorithms, we achieve up to 9.88x speedup, and compared to existing implementations of k-clique densest subgraph, we achieve up to 11.83x speedup. We are able to compute the 4-clique counts on the largest publicly-available graph with over two hundred billion edges for the first time. Copyright © 2020, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

BrahMap: A scalable and modular map-making framework for the CMB experiments

BrahMap: A scalable and modular map-making framework for the...

引用

Euromicro Conference on parallel, Distributed and Network-Based Processing

作者： Avinash Anand Giuseppe Puglisi Dipartimento di Fisica Università di Roma Tor Vergata Via della Ricerca Scientifica 1 Roma Italy INFN Sezione di Roma2 Università di Roma Tor Vergata Via della Ricerca Scientifica 1 Roma Italy Dipartimento di Fisica e Astronomia Università degli Studi di Catania Via S. Sofia 64 Catania Italy INAF Osservatorio Astrofisico di Catania Via S.Sofia 78 Catania Italy INFN Sezione di Catania Via S.Sofia 64 Catania Italy

ISBN: (数字)9798331524937

ISBN: (纸本)9798331524944

The cosmic microwave background (CMB) experiments have reached an era of unprecedented precision and complexity. Aiming to detect the primordial B-mode polarization signal, these experiments will soon be equipped with $10^{4}$ to $10^{5}$ detectors. Consequently, future CMB missions will face the substantial challenge of efficiently processing vast amounts of raw data to produce the initial scientific outputs - the sky maps - within a reasonable time frame and with available computational resources. To address this, we introduce BrahMap, a new map-making framework that will be scalable across both CPU and GPU platforms. Implemented in C++ with a user-friendly Python interface for handling sparse linear systems, BrahMap employs advanced numerical analysis and high-performance computing techniques to maximize the use of super-computing infrastructure. This work features an overview of the BrahMap’s capabilities and preliminary performance scaling results, with application to a generic CMB polarization experiment.

关键词： Numerical analysis Scalability Object oriented modeling Pipelines Graphics processing units Numerical models parallel algorithms Object oriented programming Physics Python

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel Shortest Path algorithms

Efficient Parallel Shortest Path Algorithms

引用

International Symposium on parallel and Distributed Computing

作者： David R. Alves Madan S. Krishnakumar Vijay K. Garg Department of Electrical and Computer Engineering The University of Texas at Austin Austin TX USA

ISBN: (数字)9781728189468

ISBN: (纸本)9781728189475

Finding the shortest path between nodes in a graph has wide applications in many important areas such as transportation and computer networks. However, the current reference algorithms for this task, Dijkstra's for single threaded environments and Δ-stepping for multi-threaded ones, leave performance and efficiency on the table by not taking advantage of additional information available about the graph. In this paper we present and experimentally evaluate novel algorithms SP 1 , SP 2 and ParSP 2 that leverage these constraints to solve the problem faster and more efficiently in key metrics. In single threaded execution, we show how SP 1 and SP 2 out-perform Dijsktra's algorithm by up to 46%. In multi-threaded execution we show how our algorithms compare favorably to Δ-stepping algorithm in the ability to establish the shortest path between the source and the median node.

关键词： Classification algorithms Complexity theory parallel algorithms Instruction sets Computer networks Shortest path problem

来源：评论

学校读者我要写书评

暂无评论

FFT Algorithm Optimization and RD Imaging Algorithm Implementation Based on Heterogeneous Platform

FFT Algorithm Optimization and RD Imaging Algorithm Implemen...

引用

International Conference on Communication Technology (ICCT)

作者： BingZhi Hou Chengguang Ma Junyu Li Daiwei Li Graduate School of the Second Research Institute of China Aerospace Science and Industry Corporation Beijing China Beijing Remote Sensing Equipment Research Institute Beijing China China Telecom Corporation Limited Beijing China

ISBN: (数字)9798350363760

ISBN: (纸本)9798350363777

This paper implements the Fast Fourier Transform (FFT) algorithm for signal data processing using Open Computing Language (OpenCL). A parallel algorithm model suitable for staged FFT across different GPUs is proposed, including methods for execution and memory model settings. The characteristics of the OpenCL model and specific data structures are applied to optimize the logical structure of the parallel algorithm. Finally, the proposed method is applied and implemented in the Synthetic Aperture Radar(SAR) imaging RD algorithm. Experimental data confirm that the computational speed of the parallel algorithm in this paper is significantly higher than that of a serial CPU-based algorithm. Compared to the fastest FFT algorithm FFTW on the current CPU platform, it achieves substantially better performance. Additionally, compared to the CUDA-based CUFFT parallel algorithm, the performance of the algorithm in this paper is notably improved. In the SAR imaging RD algorithm, based on classical airborne SAR imaging parameters, it shows a significant improvement over FFTW.

关键词： Fast Fourier transforms Computational modeling Signal processing algorithms Graphics processing units Imaging Signal processing Programming Radar imaging Radar polarimetry parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：