Pairings are useful tools in cryptography and efficient implementations play a critical role in their usage, where Miller's algorithms are the main method for all pairings. As an alternative approach, elliptic net...
详细信息
ISBN:
(纸本)9789819750245;9789819750252
Pairings are useful tools in cryptography and efficient implementations play a critical role in their usage, where Miller's algorithms are the main method for all pairings. As an alternative approach, elliptic nets were first employed to evaluate Tate pairings and generalized to the hyperelliptic nets for Tate pairings on hyperelliptic curves. In this work, for hyperelliptic pairings derived from rational functions, we establish the unitary formulae in terms of hyperelliptic nets. Afterwards, for genus-2 hyperelliptic pairings, we construct a parallel Double-and-Add algorithm on the minimal block. In particular, all terms in new blocks, having irrelevant formulae on current blocks, can be evaluated with 12 processors in parallel, thus the explicit loop cost reduces to 4M' (multiplications in extension fields) with 276 parallel processors. As an additional merit, Double and Double-Add algorithms invoke analogous operations such that our method avoids extra additions in Miller's algorithms.
A mathematical model is developed and numerical modeling is performed to solve a scientific and industrial problem in the field of studying mass transfer processes in the "fracture set - matrix" system in a ...
详细信息
We define a new class of predicates called equilevel predicates on a distributive lattice which eases the analysis of parallel algorithms. Many combinatorial problems such as the vertex cover problem, the bipartite ma...
详细信息
Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. parallel algorithms however may exploit...
详细信息
Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve (often strongly) superior predictive performance.
This paper explores the application of parallel algorithms and high-performance computing (HPC) in the processing and forecasting of large-scale water demand data. Building upon prior work, which identified the need f...
详细信息
This paper explores the application of parallel algorithms and high-performance computing (HPC) in the processing and forecasting of large-scale water demand data. Building upon prior work, which identified the need for more robust and scalable forecasting models, this study integrates parallel computing frameworks such as Apache Spark for distributed data processing, Message Passing Interface (MPI) for fine-grained parallel execution, and CUDA-enabled GPUs for deep learning acceleration. These advancements significantly improve model training and deployment speed, enabling near-real-time data processing. Apache Spark's in-memory computing and distributed data handling optimize data preprocessing and model execution, while MPI provides enhanced control over custom parallel algorithms, ensuring high performance in complex simulations. By leveraging these techniques, urban water utilities can implement scalable, efficient, and reliable forecasting solutions critical for sustainable water resource management in increasingly complex environments. Additionally, expanding these models to larger datasets and diverse regional contexts will be essential for validating their robustness and applicability in different urban settings. Addressing these challenges will help bridge the gap between theoretical advancements and practical implementation, ensuring that HPC-driven forecasting models provide actionable insights for real-world water management decision-making.
Jaccard similarity between a pair of vertices in a graph measures the relative overlap among their adjacent vertices. This metric is used to estimate the strength of existing edges and predict new edges between pairs ...
详细信息
ISBN:
(纸本)9798350308600
Jaccard similarity between a pair of vertices in a graph measures the relative overlap among their adjacent vertices. This metric is used to estimate the strength of existing edges and predict new edges between pairs of disconnected vertices. Computing Jaccard similarity for all pairs of vertices or for all edges is computationally expensive. Existing sequential and parallel algorithms are either too slow or do not scale well for large scale graphs. We present a shared-memory parallel algorithm for computing Jaccard weights. Our algorithm relies on sparse linear algebraic operations that utilize masking, semirings, vector iterators, and other GraphBLAS features for performance. Our implementation, albeit simple, outperforms recent state-of-the-art implementations by a factor of up to 20x and exhibits an average speedup of 9x.
We consider two difference schemes that describe the convective-diffusion transfer and settling of multifractional suspensions in coastal systems. The first is based on an explicit-implicit scheme with reduced cost of...
详细信息
We present the first parallel algorithms that decide strong and branching bisimilarity in linear time. More precisely, if a transition system has n states, m transitions and |Act| action labels, we introduce an algori...
详细信息
We present the first parallel algorithms that decide strong and branching bisimilarity in linear time. More precisely, if a transition system has n states, m transitions and |Act| action labels, we introduce an algorithm that decides strong bisimilarity in O(n + |Act|) time on max(n, m) processors and an algorithm that decides branching bisimilarity in O(n + |Act|) time using up to max(n(2), m, |Act|n) processors.
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first...
详细信息
ISBN:
(数字)9798350352917
ISBN:
(纸本)9798350352924;9798350352917
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multi-grid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS-SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TS-SpGEMM algorithm attains 5x performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
The densest subgraph problem has received significant attention, both in theory and in practice, due to its applications in problems such as community detection, social network analysis, and spam detection. Due to the...
详细信息
ISBN:
(纸本)9781611977929
The densest subgraph problem has received significant attention, both in theory and in practice, due to its applications in problems such as community detection, social network analysis, and spam detection. Due to the high cost of obtaining exact solutions, much attention has focused on designing approximate densest subgraph algorithms. However, existing approaches are not able to scale to massive graphs with billions of edges. In this paper, we introduce a new framework that combines approximate densest subgraph algorithms with a pruning optimization. We design new parallel variants of the state-of-the-art sequential Greedy++ algorithm, and plug it into our framework in conjunction with a parallel pruning technique based on k-core decomposition to obtain parallel (1+epsilon)-approximate densest subgraph algorithms. On a single thread, our algorithms achieve 2.6-34x speedup over Greedy++, and obtain up to 22.37x self-relative parallel speedup on a 30core machine with two-way hyper-threading. Compared with the state-of-the-art parallel algorithm by Harb et al. [NeurIPS'22], we achieve up to a 114x speedup on the same machine. Finally, against the recent sequential algorithm of Xu et al. [PACMMOD'23], we achieve up to a 25.9x speedup. The scalability of our algorithms enables us to obtain near-optimal density statistics on the hyperlink2012 (with roughly 113 billion edges) and clueweb (with roughly 37 billion edges) graphs for the first time in the literature.
暂无评论