Link prediction can help rectify inaccuracies in various graph algorithms, stemming from unaccounted-for or overlooked links within networks. However, many existing works use a baseline approach, which incurs unnecess...
详细信息
Link prediction can help rectify inaccuracies in various graph algorithms, stemming from unaccounted-for or overlooked links within networks. However, many existing works use a baseline approach, which incurs unnecessary computational costs due to its high time complexity. Further, many studies focus on smaller graphs, which can lead to misleading conclusions. Here, we study the prediction of links using neighborhood-based similarity measures on large graphs. In particular, we improve upon the baseline approach (IBase), and propose a heuristic approach that additionally disregards large hubs (DLH), based on the idea that high-degree nodes contribute little similarity among their neighbors. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, DLH is on average 1019x$$ 1019\times $$ faster than IBase, especially on web graphs and social networks, while maintaining similar prediction accuracy. Notably, DLH achieves a link prediction rate of 38.1M edges/s and improves performance by 1.6x$$ 1.6\times $$ for every doubling of threads.
Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, w...
详细信息
Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, with high accuracy. However, both inference and training of GNNs are complex, and they uniquely combine the features of irregular graph processing with dense and regular computations. This complexity makes it very challenging to execute GNNs efficiently on modern massively parallel architectures. To alleviate this, we first design a taxonomy of parallelism in GNNs, considering data and model parallelism, and different forms of pipelining. Then, we use this taxonomy to investigate the amount of parallelism in numerous GNN models, GNN-driven machine learning tasks, software frameworks, or hardware accelerators. We use the work-depth model, and we also assess communication volume and synchronization. We specifically focus on the sparsity/density of the associated tensors, in order to understand how to effectively apply techniques such as vectorization. We also formally analyze GNN pipelining, and we generalize the established Message-Passing class of GNN models to cover arbitrary pipeline depths, facilitating future optimizations. Finally, we investigate different forms of asynchronicity, navigating the path for future asynchronous parallel GNN pipelines. The outcomes of our analysis are synthesized in a set of insights that help to maximize GNN performance, and a comprehensive list of challenges and opportunities for further research into efficient GNN computations. Our work will help to advance the design of future GNNs.
Various classic reasoning problems with natural hypergraph representations are known to be tractable if a hypertree decomposition (HD) of low width exists. The resulting algorithms are attractive for practical use in ...
详细信息
Various classic reasoning problems with natural hypergraph representations are known to be tractable if a hypertree decomposition (HD) of low width exists. The resulting algorithms are attractive for practical use in fields like databases and constraint satisfaction. However, algorithmic use of HDs relies on the difficult task of first computing a decomposition of the hypergraph underlying a given problem instance, which is then used to guide the algorithm for this particular instance. The performance of purely sequential methods for computing HDs is inherently limited, yet the problem is, theoretically, amenable to parallelisation. In this article, we propose the first algorithm for computing hypertree decompositions that is well suited for parallelisation. The newly proposed algorithm log-k-decomp requires only a logarithmic number of recursion levels and additionally allows for highly parallelised pruning of the search space by restriction to so-called balanced separators. We provide a detailed experimental evaluation over the HyperBench benchmark and demonstrate that log-k-decomp outperforms the current state of the art significantly.
Computing intersections among sets of one-dimensional intervals is an ubiquitous problem in computational geometry with important applications in bioinformatics, where the size of typical inputs is large and it is the...
详细信息
Computing intersections among sets of one-dimensional intervals is an ubiquitous problem in computational geometry with important applications in bioinformatics, where the size of typical inputs is large and it is therefore important to use efficient algorithms. In this paper we propose a parallel algorithm for the 1D intersection -counting problem, that is, the problem of counting the number of intersections between each interval in a given set A and every interval in a set B . Our algorithm is suitable for shared -memory architectures (e.g., multicore CPUs) and GPUs. The algorithm is work -efficient because it performs the same amount of work as the best serial algorithm for this kind of problem. Our algorithm has been implemented in C++ using the Thrust parallel algorithms library, enabling the generation of optimized programs for multicore CPUs and GPUs from the same source code. The performance of our algorithm is evaluated on synthetic and real datasets, showing good scalability on different generations of hardware.
In this paper, we analyze two types of refutations for Unit Two Variable Per Inequality (UTVPI) constraints. A UTVPI constraint is a linear inequality of the form: $a_{i}\cdot x_{i}+a_{j} \cdot x_{j} \le b_{k}$, where...
详细信息
In this paper, we analyze two types of refutations for Unit Two Variable Per Inequality (UTVPI) constraints. A UTVPI constraint is a linear inequality of the form: $a_{i}\cdot x_{i}+a_{j} \cdot x_{j} \le b_{k}$, where $a_{i},a_{j}\in \{0,1,-1\}$ and $b_{k} \in \mathbb{Z}$. A conjunction of such constraints is called a UTVPI constraint system (UCS) and can be represented in matrix form as: ${\bf A \cdot x \le b}$. UTVPI constraints are used in many domains including operations research and program verification. We focus on two variants of read-once refutation (ROR). An ROR is a refutation in which each constraint is used at most once. A literal-once refutation (LOR), a more restrictive form of ROR, is a refutation in which each literal ($x_i$ or $-x_i$) is used at most once. First, we examine the constraint-required read-once refutation (CROR) problem and the constraint-required literal-once refutation (CLOR) problem. In both of these problems, we are given a set of constraints that must be used in the refutation. RORs and LORs are incomplete since not every system of linear constraints is guaranteed to have such a refutation. This is still true even when we restrict ourselves to UCSs. In this paper, we provide NC reductions between the CROR and CLOR problems in UCSs and the minimum weight perfect matching problem. The reductions used in this paper assume a CREW PRAM model of parallel computation. As a result, the reductions establish that, from the perspective of parallel algorithms, the CROR and CLOR problems in UCSs are equivalent to matching. In particular, if an NC algorithm exists for either of these problems, then there is an NC algorithm for matching.
With increasing grid modernization efforts, future electric grids will be governed by more complex and faster dynamics with the high penetration of new components such as power electronic-based control devices and lar...
详细信息
With increasing grid modernization efforts, future electric grids will be governed by more complex and faster dynamics with the high penetration of new components such as power electronic-based control devices and large renewable resources. These lead to the importance of developing real-time dynamic security assessment under the consideration of uncertainties, whose main tool is time-domain simulation. Though there are many efforts to improve the computational performance of time-domain simulation, its focus has been on the deterministic differential-algebraic equations (DAEs) without modeling uncertainties inherent in power system networks. To this end, this paper investigates large-scale time-domain simulation including effects of stochastic perturbations and ways for its computational enhancement. Particularly, it utilizes the parallel-in-time (Parareal) algorithm, which has shown great potentials, to solve stochastic DAEs (SDAEs) efficiently. A general procedure to compute the numerical solution of SDAEs with the Parareal algorithm is described. Numerical case studies with 10-generator 39-bus system and 327-generator 2383-bus system are performed to demonstrate its feasibility and efficiency. We also discuss the feasibility of employing semi-analytical solution methods, using the Adomian decomposition method, to solve SDAEs. The proposed simulation framework provides a general solution scheme and has the potential for fast and large-scale stochastic power system dynamic simulations.
A parallel algorithm is presented in this article to efficiently solve the optimal consensus problem of multiagent systems. By utilizing a Jacobi-type proximal alternating direction multiplier framework, the optimizat...
详细信息
A parallel algorithm is presented in this article to efficiently solve the optimal consensus problem of multiagent systems. By utilizing a Jacobi-type proximal alternating direction multiplier framework, the optimization process is divided into two independent subproblems that can be solved in parallel to improve computational efficiency, followed by the Lagrangian multiplier update. The convergence analysis of the proposed algorithm is performed using the convex optimization theory, deriving the convergence conditions concerning the auxiliary parameters. Furthermore, the accelerated algorithm enjoys a convergence rate of O(1/t(2)) by adjusting the auxiliary parameters adaptively. To leverage the strengths of the collaboration of multiagent systems, the distributed implementation of the proposed parallel algorithm is further developed, where each agent addresses its private subproblems only using its own and its neighbor's information. Numerical simulations demonstrate the effectiveness of the theoretical results.
Block matrices with simultaneously diagonalizable blocks arise in diverse application areas, including, e.g., numerical methods for engineering based on partial differential equations as well as network synchronizatio...
详细信息
Block matrices with simultaneously diagonalizable blocks arise in diverse application areas, including, e.g., numerical methods for engineering based on partial differential equations as well as network synchronization, cryptography and control theory. In the present paper, we develop a parallel algorithm for the inversion of m x m block matrices with simultaneously-diagonalizable blocks of order n. First, a sequential version of the algorithm is presented and its computational complexity is determined. Then, a parallelization of the algorithm is implemented and analyzed. The complexity of the derived parallel algorithm is expressed as a function of m and n as well as of the number of utilized CPU threads. Results of numerical experiments demonstrate the CPU time superiority of the parallel algorithm versus the respective sequential version and a standard inversion method applied to the original block matrix. An efficient parallelizable procedure to compute the determinants of such block matrices is also described. Numerical examples are presented for using the developed serial and parallel inversion algorithms for boundary-value problems involving transmission problems for the Helmholtz partial differential equation in piecewise homogeneous media.
Field programmable gate array (FPGA) based hardware-in-the-loop (HIL) simulation is an effective tool to verify the performance of physical controllers and shorten the development cycle of power converters. In HIL sim...
详细信息
Field programmable gate array (FPGA) based hardware-in-the-loop (HIL) simulation is an effective tool to verify the performance of physical controllers and shorten the development cycle of power converters. In HIL simulations, sampling accuracy is of concern and is desired to be improved by reducing the step size. However, due to the cost of computational time, the step size is hard to reduce indefinitely to meet the requirements for high switching frequency applications. To improve the sampling accuracy and simulation performance of HIL simulation, this article proposes a semi-implicit parallel leapfrog (SPL) solver with half-step sampling technique. In this solver, the switches and the rest part of the system are implemented to be computed parallel when the switch leg model operates in continuous current mode. Besides, the solver is formulated in leapfrog format to reduce computational costs and to compute at half-step as a minimum step-size. With this format, the half-step sampling technique can be employed to increase the sampling rate by onefold, even in cases where it is challenging to reduce the simulation step size further. A dual active bridge converter case is implemented on the FPGA board with a 12.5-ns sampling step-size, retaining the simulation accuracy while switched at 400 kHz. To further verify the advantages, the results are compared with other HIL method and experimental results.
Long read technologies are continuing to evolve at a rapid pace, with the latest of the high fidelity technologies delivering reads over 10 Kbp with high accuracy. Classical long read assemblers produce assemblies dir...
详细信息
Long read technologies are continuing to evolve at a rapid pace, with the latest of the high fidelity technologies delivering reads over 10 Kbp with high accuracy. Classical long read assemblers produce assemblies directly from long reads. Hybrid assembly workflows provide ways to combine partially constructed assemblies (or contigs) with newly sequenced long reads in order to generate genomic scaffolds. Under either setting, the main computational bottleneck is the step of mapping the long reads. While many tools implement the mapping step through overlap computations, designing alignment-free approaches is necessary for large-scale computations. In this paper, we visit the problem of mapping long reads to a database of subject sequences, in a fast and accurate manner. We present an efficient parallel algorithmic workflow, called JEM-mapper, that uses a new minimizer-based Jaccard estimator (or JEM) sketch to perform alignment-free mapping of long reads. For implementation and evaluation, we consider two application settings: (i) the hybrid scaffolding setting, which aims to map long reads to partial assemblies;and (ii) the classical long read assembly setting, which aims to map long reads to one another. We implemented an MPI+OpenMP version of JEM-mapper to enable parallelism at both distributed- and shared-memory layers. Experimental evaluation shows that JEM-mapper produces high-quality mapping while significantly improving the time to solution compared to state-of-the-art tools;e.g., in the hybrid setting for a large genome with 429 K HiFi long reads and 98K contigs, JEM-mapper produces a mapping with 99.41% precision and 97.91% recall, and 6.9x speedup over a state-of-the-art mapper.
暂无评论