检索结果-内蒙古大学图书馆

A Fast parallel Approach for Neighborhood-Based Link Prediction by Disregarding Large Hubs

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2025年第2期37卷

作者： Sahu, Subhajit Kothapalli, Kishore Int Inst Informat Technol Hyderabad Ctr Secur Theory & Algorithms Hyderabad Telangana India

Link prediction can help rectify inaccuracies in various graph algorithms, stemming from unaccounted-for or overlooked links within networks. However, many existing works use a baseline approach, which incurs unnecessary computational costs due to its high time complexity. Further, many studies focus on smaller graphs, which can lead to misleading conclusions. Here, we study the prediction of links using neighborhood-based similarity measures on large graphs. In particular, we improve upon the baseline approach (IBase), and propose a heuristic approach that additionally disregards large hubs (DLH), based on the idea that high-degree nodes contribute little similarity among their neighbors. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, DLH is on average 1019x$$ 1019\times $$ faster than IBase, especially on web graphs and social networks, while maintaining similar prediction accuracy. Notably, DLH achieves a link prediction rate of 38.1M edges/s and improves performance by 1.6x$$ 1.6\times $$ for every doubling of threads.

关键词： link prediction local/neighborhood-based parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024年第5期46卷 2584-2606页

作者： Besta, Maciej Hoefler, Torsten Swiss Fed Inst Technol Dept Comp Sci CH- 8092 Zurich Switzerland

Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, with high accuracy. However, both inference and training of GNNs are complex, and they uniquely combine the features of irregular graph processing with dense and regular computations. This complexity makes it very challenging to execute GNNs efficiently on modern massively parallel architectures. To alleviate this, we first design a taxonomy of parallelism in GNNs, considering data and model parallelism, and different forms of pipelining. Then, we use this taxonomy to investigate the amount of parallelism in numerous GNN models, GNN-driven machine learning tasks, software frameworks, or hardware accelerators. We use the work-depth model, and we also assess communication volume and synchronization. We specifically focus on the sparsity/density of the associated tensors, in order to understand how to effectively apply techniques such as vectorization. We also formally analyze GNN pipelining, and we generalize the established Message-Passing class of GNN models to cover arbitrary pipeline depths, facilitating future optimizations. Finally, we investigate different forms of asynchronicity, navigating the path for future asynchronous parallel GNN pipelines. The outcomes of our analysis are synthesized in a set of insights that help to maximize GNN performance, and a comprehensive list of challenges and opportunities for further research into efficient GNN computations. Our work will help to advance the design of future GNNs.

关键词： Deep learning parallel processing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast parallel Hypertree Decompositions in Logarithmic Recursion Depth

引用

ACM TRANSACTIONS ON DATABASE SYSTEMS 2024年第1期49卷 1-43页

作者： Gottlob, Georg Lanzinger, Matthias Okulmus, Cem Pichler, Reinhard Univ Calabria Dipartimento Matemat & Informat Via P BucciEdificio 30B I-87036 Arcavacata Di Rende Italy Univ Oxford Dept Comp Sci 7 Parks Rd Oxford OX1 3QG England TU Wien Inst Log & Computat Favoritenstr 9-11 A-1040 Vienna Austria Umea Univ Dept Comp Sci Univ Storget 4 S-90187 Umea Sweden

Various classic reasoning problems with natural hypergraph representations are known to be tractable if a hypertree decomposition (HD) of low width exists. The resulting algorithms are attractive for practical use in fields like databases and constraint satisfaction. However, algorithmic use of HDs relies on the difficult task of first computing a decomposition of the hypergraph underlying a given problem instance, which is then used to guide the algorithm for this particular instance. The performance of purely sequential methods for computing HDs is inherently limited, yet the problem is, theoretically, amenable to parallelisation. In this article, we propose the first algorithm for computing hypertree decompositions that is well suited for parallelisation. The newly proposed algorithm log-k-decomp requires only a logarithmic number of recursion levels and additionally allows for highly parallelised pruning of the search space by restriction to so-called balanced separators. We provide a detailed experimental evaluation over the HyperBench benchmark and demonstrate that log-k-decomp outperforms the current state of the art significantly.

关键词： Hypergraph decomposition hypertree width parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel intersection counting on shared-memory multiprocessors and GPUs

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2024年 159卷 423-431页

作者： Marzolla, Moreno Birolo, Giovanni D'Angelo, Gabriele Fariselli, Piero Univ Bologna Dipartimento Informat Sci & Ingn DISI Mura Anteo Zamboni 7 I-40126 Bologna Italy Univ Torino Dipartimento Sci Med Corso Dogliotti 14 IT-10126 Turin Italy Univ Bologna Ctr Interdept Ind Res ICT I-40126 Bologna Italy Univ Bologna Dept Comp Sci & Engn DISI Cesena CampusVia Univ 50 I-47521 Cesena Italy

Computing intersections among sets of one-dimensional intervals is an ubiquitous problem in computational geometry with important applications in bioinformatics, where the size of typical inputs is large and it is therefore important to use efficient algorithms. In this paper we propose a parallel algorithm for the 1D intersection -counting problem, that is, the problem of counting the number of intersections between each interval in a given set A and every interval in a set B . Our algorithm is suitable for shared -memory architectures (e.g., multicore CPUs) and GPUs. The algorithm is work -efficient because it performs the same amount of work as the best serial algorithm for this kind of problem. Our algorithm has been implemented in C++ using the Thrust parallel algorithms library, enabling the generation of optimized programs for multicore CPUs and GPUs from the same source code. The performance of our algorithm is evaluated on synthetic and real datasets, showing good scalability on different generations of hardware.

关键词： Intersection counting parallel algorithms GPU programming Shared-memory algorithm Bioinformatics

来源：评论

学校读者我要写书评

暂无评论

Constrained read-once refutations in UTVPI constraint systems: A parallel perspective

引用

MATHEMATICAL STRUCTURES IN COMPUTER SCIENCE 2024年第3期34卷 227-243页

作者： Subramani, K. Wojciechowski, Piotr West Virginia Univ LCSEE Morgantown WV 26506 USA

In this paper, we analyze two types of refutations for Unit Two Variable Per Inequality (UTVPI) constraints. A UTVPI constraint is a linear inequality of the form: $a_{i}\cdot x_{i}+a_{j} \cdot x_{j} \le b_{k}$, where $a_{i},a_{j}\in \{0,1,-1\}$ and $b_{k} \in \mathbb{Z}$. A conjunction of such constraints is called a UTVPI constraint system (UCS) and can be represented in matrix form as: ${\bf A \cdot x \le b}$. UTVPI constraints are used in many domains including operations research and program verification. We focus on two variants of read-once refutation (ROR). An ROR is a refutation in which each constraint is used at most once. A literal-once refutation (LOR), a more restrictive form of ROR, is a refutation in which each literal ($x_i$ or $-x_i$) is used at most once. First, we examine the constraint-required read-once refutation (CROR) problem and the constraint-required literal-once refutation (CLOR) problem. In both of these problems, we are given a set of constraints that must be used in the refutation. RORs and LORs are incomplete since not every system of linear constraints is guaranteed to have such a refutation. This is still true even when we restrict ourselves to UCSs. In this paper, we provide NC reductions between the CROR and CLOR problems in UCSs and the minimum weight perfect matching problem. The reductions used in this paper assume a CREW PRAM model of parallel computation. As a result, the reductions establish that, from the perspective of parallel algorithms, the CROR and CLOR problems in UCSs are equivalent to matching. In particular, if an NC algorithm exists for either of these problems, then there is an NC algorithm for matching.

关键词： UTVPI constraints matching parallel algorithms read-once refutation

来源：评论

学校读者我要写书评

暂无评论

Stochastic Power System Dynamic Simulation Using parallel-in-Time Algorithm

引用

IEEE ACCESS 2024年 12卷 28500-28510页

作者： Park, Byungkwon Soongsil Univ Sch Elect Engn Seoul 06978 South Korea

With increasing grid modernization efforts, future electric grids will be governed by more complex and faster dynamics with the high penetration of new components such as power electronic-based control devices and large renewable resources. These lead to the importance of developing real-time dynamic security assessment under the consideration of uncertainties, whose main tool is time-domain simulation. Though there are many efforts to improve the computational performance of time-domain simulation, its focus has been on the deterministic differential-algebraic equations (DAEs) without modeling uncertainties inherent in power system networks. To this end, this paper investigates large-scale time-domain simulation including effects of stochastic perturbations and ways for its computational enhancement. Particularly, it utilizes the parallel-in-time (Parareal) algorithm, which has shown great potentials, to solve stochastic DAEs (SDAEs) efficiently. A general procedure to compute the numerical solution of SDAEs with the Parareal algorithm is described. Numerical case studies with 10-generator 39-bus system and 327-generator 2383-bus system are performed to demonstrate its feasibility and efficiency. We also discuss the feasibility of employing semi-analytical solution methods, using the Adomian decomposition method, to solve SDAEs. The proposed simulation framework provides a general solution scheme and has the potential for fast and large-scale stochastic power system dynamic simulations.

关键词： Stochastic processes Power system dynamics Load modeling Perturbation methods Time-domain analysis Uncertainty Behavioral sciences Brownian motion parallel processing Differential equations parallel algorithms power system dynamics semi-analytical solution stochastic differential algebraic equations time domain simulation

来源：评论

学校读者我要写书评

暂无评论

An Efficient Distributed parallel Algorithm for Optimal Consensus of Multiagent Systems

引用

IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS 2024年第3期11卷 1440-1451页

作者： Bai, Nan Wang, Qishao Duan, Zhisheng Peking Univ Coll Engn Dept Mech & Engn Sci State Key Lab Turbulence & Complex Syst Beijing 100871 Peoples R China Beihang Univ Dept Dynam & Control Beijing 100191 Peoples R China

A parallel algorithm is presented in this article to efficiently solve the optimal consensus problem of multiagent systems. By utilizing a Jacobi-type proximal alternating direction multiplier framework, the optimization process is divided into two independent subproblems that can be solved in parallel to improve computational efficiency, followed by the Lagrangian multiplier update. The convergence analysis of the proposed algorithm is performed using the convex optimization theory, deriving the convergence conditions concerning the auxiliary parameters. Furthermore, the accelerated algorithm enjoys a convergence rate of O(1/t(2)) by adjusting the auxiliary parameters adaptively. To leverage the strengths of the collaboration of multiagent systems, the distributed implementation of the proposed parallel algorithm is further developed, where each agent addresses its private subproblems only using its own and its neighbor's information. Numerical simulations demonstrate the effectiveness of the theoretical results.

关键词： Optimization Multi-agent systems Convergence parallel algorithms Jacobian matrices Protocols Costs Convex optimization distributed optimization multiagent systems optimal consensus parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for the inversion of matrices with simultaneously diagonalizable blocks

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2024年 174卷 340-351页

作者： Lazaridis, Dimitrios S. Draziotis, Konstantinos A. Tsitsas, Nikolaos L. Aristotle Univ Thessaloniki Sch Informat Thessaloniki 54124 Greece

Block matrices with simultaneously diagonalizable blocks arise in diverse application areas, including, e.g., numerical methods for engineering based on partial differential equations as well as network synchronization, cryptography and control theory. In the present paper, we develop a parallel algorithm for the inversion of m x m block matrices with simultaneously-diagonalizable blocks of order n. First, a sequential version of the algorithm is presented and its computational complexity is determined. Then, a parallelization of the algorithm is implemented and analyzed. The complexity of the derived parallel algorithm is expressed as a function of m and n as well as of the number of utilized CPU threads. Results of numerical experiments demonstrate the CPU time superiority of the parallel algorithm versus the respective sequential version and a standard inversion method applied to the original block matrix. An efficient parallelizable procedure to compute the determinants of such block matrices is also described. Numerical examples are presented for using the developed serial and parallel inversion algorithms for boundary-value problems involving transmission problems for the Helmholtz partial differential equation in piecewise homogeneous media.

关键词： Matrix inversion Block matrices Simultaneously diagonalizable parallel algorithms Boundary-value problems Helmholtz equation

来源：评论

学校读者我要写书评

暂无评论

A Semi-Implicit parallel Leapfrog Solver With Half-Step Sampling Technique for FPGA-Based Real-Time HIL Simulation of Power Converters

引用

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2024年第3期71卷 2454-2464页

作者： Zheng, Jialin Zeng, Yangbin Zhao, Zhengming Liu, Weicheng Xu, Han Ji, Shiqi Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China

Field programmable gate array (FPGA) based hardware-in-the-loop (HIL) simulation is an effective tool to verify the performance of physical controllers and shorten the development cycle of power converters. In HIL simulations, sampling accuracy is of concern and is desired to be improved by reducing the step size. However, due to the cost of computational time, the step size is hard to reduce indefinitely to meet the requirements for high switching frequency applications. To improve the sampling accuracy and simulation performance of HIL simulation, this article proposes a semi-implicit parallel leapfrog (SPL) solver with half-step sampling technique. In this solver, the switches and the rest part of the system are implemented to be computed parallel when the switch leg model operates in continuous current mode. Besides, the solver is formulated in leapfrog format to reduce computational costs and to compute at half-step as a minimum step-size. With this format, the half-step sampling technique can be employed to increase the sampling rate by onefold, even in cases where it is challenging to reduce the simulation step size further. A dual active bridge converter case is implemented on the FPGA board with a 12.5-ns sampling step-size, retaining the simulation accuracy while switched at 400 kHz. To further verify the advantages, the results are compared with other HIL method and experimental results.

关键词： Field-programmable gate array (FPGA) hardware-in-the-loop (HIL) parallel algorithms power converter design real-time simulation

来源：评论

学校读者我要写书评

暂无评论

An Efficient parallel Sketch-Based Algorithmic Workflow for Mapping Long Reads

引用

IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2025年第1期22卷 13-26页

作者： Rahman, Tazin Bhowmik, Oieswarya Kalyanaraman, Ananth Washington State Univ Comp Sci & Engn Pullman WA 99163 USA Washington State Univ Sch Elect Engn & Comp Sci Pullman WA 99163 USA

Long read technologies are continuing to evolve at a rapid pace, with the latest of the high fidelity technologies delivering reads over 10 Kbp with high accuracy. Classical long read assemblers produce assemblies directly from long reads. Hybrid assembly workflows provide ways to combine partially constructed assemblies (or contigs) with newly sequenced long reads in order to generate genomic scaffolds. Under either setting, the main computational bottleneck is the step of mapping the long reads. While many tools implement the mapping step through overlap computations, designing alignment-free approaches is necessary for large-scale computations. In this paper, we visit the problem of mapping long reads to a database of subject sequences, in a fast and accurate manner. We present an efficient parallel algorithmic workflow, called JEM-mapper, that uses a new minimizer-based Jaccard estimator (or JEM) sketch to perform alignment-free mapping of long reads. For implementation and evaluation, we consider two application settings: (i) the hybrid scaffolding setting, which aims to map long reads to partial assemblies;and (ii) the classical long read assembly setting, which aims to map long reads to one another. We implemented an MPI+OpenMP version of JEM-mapper to enable parallelism at both distributed- and shared-memory layers. Experimental evaluation shows that JEM-mapper produces high-quality mapping while significantly improving the time to solution compared to state-of-the-art tools;e.g., in the hybrid setting for a large genome with 429 K HiFi long reads and 98K contigs, JEM-mapper produces a mapping with 99.41% precision and 97.91% recall, and 6.9x speedup over a state-of-the-art mapper.

关键词： Sequential analysis Accuracy Databases Pipelines Genomics parallel processing Hybrid power systems Bioinformatics Assembly Computational biology Alignment-free hybrid assembly long read mapping MinHashing parallel algorithms sketching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：