Many of the classic graph problems cannot be solved in the Massively parallel Computation setting (MPC) with strongly sublinear space per machine and o(log n) rounds, unless the 1-vs-2 cycles conjecture is false. This...
详细信息
ISBN:
(纸本)9781611977554
Many of the classic graph problems cannot be solved in the Massively parallel Computation setting (MPC) with strongly sublinear space per machine and o(log n) rounds, unless the 1-vs-2 cycles conjecture is false. This is true even on planar graphs. Such problems include, for example, counting connected components, bipartition, minimum spanning tree problem, (approximate) shortest paths, and (approximate) diameter/radius. In this paper, we show a way to get around this limitation. Specifically, we show that if we have a \nice" (for example, straight-line) embedding of the input graph, all the mentioned problems can be solved with O(n(2/3+epsilon)) space per machine in O(1) rounds. In conjunction with existing algorithms for computing the Delaunay triangulation, our results imply an MPC algorithm for exact Euclidean minimum spanning thee (EMST) that uses O (n(2/3+epsilon)) space per machine and finishes in O(1) rounds. This is the first improvement over a straightforward use of the standard Boruvka's algorithm with the Dauleanay triangulation algorithm of Goodrich [SODA 1997] which results in Theta(log n) rounds. This also partially negatively answers a question of Andoni, Nikolov, Onak, and Yaroslavtsev [STOC 2014], asking for lower bounds for exact EMST. We extend our algorithms to work with embeddings consisting of curves that are not "too squiggly" (as formalized by the total absolute curvature). We do this via a new lemma which we believe is of independent interest and could be used to parameterize other geometric problems by the total absolute curvature. We also state several open problems regarding massively parallel computation on planar graphs.
This tutorial offers a practical introduction to the fascinating world of quantum computation and its application to optimization and machine learning problems. The participants will acquire hands-on experience in dev...
详细信息
ISBN:
(纸本)9798400704130
This tutorial offers a practical introduction to the fascinating world of quantum computation and its application to optimization and machine learning problems. The participants will acquire hands-on experience in developing hybrid "Variational Quantum algorithms", which combine classical and quantum computation, and in running them both on simulators and real quantum hardware provided by leading ICT companies. As a concrete use-case, of specific interest for the HPDC community, the tutorial will discuss the optimal assignment and scheduling of resources on the different nodes and layers of a Cloud/Edge architecture, a problem that is known to have NP-hard complexity. The tutorial will show how Variational Quantum algorithms can become a viable alternative to classical algorithms to solve the resource assignment problem. The participants will be driven, step-by-step, to the reformulation of the problem in terms of an Ising problem, which is then solved through two variational quantum algorithms, i.e., Quantum Approximate Optimization Algorithm (QAOA) and Variational Quantum Eigensolver (VQE), by exploiting the libraries of IBM Qiskit and Xanadu Pennylane. Finally, the tutorial will discuss the perspectives on the use of quantum computation for this and other scenarios, and the possible avenues for future academic research and industrial developments.
Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algor...
详细信息
ISBN:
(纸本)9781450391467
Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algorithms by Johnson and Read-Tarjan along with their applications to temporal graphs. To our knowledge, we are the first ones to parallelise these two algorithms in a fine-grained manner. We are also the first to demonstrate experimentally a linear performance scaling. Such a scaling is made possible by our decomposition of long sequential searches into fine-grained tasks, which are then dynamically scheduled across CPU cores, enabling an optimal load balancing. Furthermore, we show that coarse-grained parallel versions of the Johnson and the Read-Tarjan algorithms that exploit edge- or vertex-level parallelism are not scalable. On a cluster of four multi-core CPUs with 256 physical cores, our fine-grained parallelalgorithms are, on average, an order of magnitude faster than their coarse-grained parallel counterparts. The performance gap between the fine-grained and the coarse-grained parallelalgorithms widens as we use more CPU cores. When using all 256 CPU cores, our parallelalgorithms enumerate temporal cycles, on average, 260x faster than the serial algorithm of Kumar and Calders. Code repository: https://***/IBM/parallel-cycle-enumeration
It is witnessed that heterogeneous architectures, such as GPUs, TPUs, and FPGAs, have made complex algorithms practical in the last decade. Despite multiple efforts to study the scheduling of these parallel and comple...
详细信息
ISBN:
(纸本)9798350311754
It is witnessed that heterogeneous architectures, such as GPUs, TPUs, and FPGAs, have made complex algorithms practical in the last decade. Despite multiple efforts to study the scheduling of these parallel and complex tasks on heterogeneous architectures, the power and energy consumption of the platforms have yet to be well managed under real-time task deadlines. To establish high schedulability in heterogeneous architectures, many scheduling strategies and models, such as multi-segment self-suspension (MSSS), have been proposed by pioneer researchers. However, directly applying this model to heterogeneous architectures with multiple CPUs and many processing elements (PEs) suffers aggravated power consumption due to the pessimism in the scheduling algorithm and the tolerance margin in the worst-case execution time (WCET) model. Therefore, this paper presents an energy-efficient real-time scheduling approach called EESchedule, which works on heterogeneous architectures with guaranteed schedulability and improved power efficiency. In EESchedule, we build a general task execution model for the general heterogeneous architectures integrating multiple CPUs and many PEs. Then, an energy-efficient real-time scheduling strategy is introduced. Next, the response time and corresponding schedulability analysis are presented for EESchedule. Finally, extensive experiments on heterogeneous NVIDIA Jetson TX2 embedded systems and GPU servers with the Intel i9-10900x CPU and RTX 3080 GPU demonstrate that the EESchedule could achieve the same schedulability with 16.8%-40.7% and 39.0%48.2% reduced power and energy consumption in comparison with state-of-the-art scheduling algorithms.
The proceedings contain 44 papers. The topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth ...
ISBN:
(纸本)9781450391467
The proceedings contain 44 papers. The topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth graphs;adaptive massively parallelalgorithms for cut problems;preparing for disaster: leveraging precomputation to efficiently repair graph structures upon failures;the energy complexity of Las Vegas leader election;a fully-distributed peer-to-peer protocol for byzantine-resilient distributed hash tables;brief announcement: the (limited) power of multiple identities: asynchronous byzantine reliable broadcast with improved resilience through collusion;brief announcement: composable dynamic secure emulation;and robust and optimal contention resolution without collision detection.
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored constant factors or not obtained the tightest possible value...
详细信息
ISBN:
(纸本)9781450391467
Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored constant factors or not obtained the tightest possible values. The main result of this work is establishing memory-independent communication lower bounds with tight constants for parallel matrix multiplication. Our constants improve on previous work in each of three cases that depend on the relative sizes of the matrix aspect ratios and the number of processors.
This paper presents a parallel version of Goldberg's algorithm for the problem of single-source shortest paths with integer (including negatives) edge weights. Given an input graph with n vertices, m edges, and in...
详细信息
ISBN:
(纸本)9781450391467
This paper presents a parallel version of Goldberg's algorithm for the problem of single-source shortest paths with integer (including negatives) edge weights. Given an input graph with n vertices, m edges, and integerweights >= -N, our algorithms solves the problem with (O) over tilde (m root n log N) work and n(5/4+o(1)) log N span, both with high probability. Our algorithm thus has work similar to Goldberg's algorithm while also achieving at least m(1/4-o(1)) parallelism. To generate our parallel version of Goldberg's algorithm, we solve two specific distance-limited shortest-path problems, both with work (O) over tilde (m) and span root L center dot n(1/2+o(1)), where L is the distance limit.
In this short note, we give a novel algorithm for O(1) round triangle counting in bounded arboricity graphs. Counting triangles in O(1) rounds (exactly) is listed as one of the interesting remaining open problems in t...
详细信息
This paper presents an O (log log (d) over bar) round massively parallel algorithm for 1 +epsilon approximation of maximum weighted b-matchings, using near-linear memory per machine. Here (d) over bar denotes the aver...
详细信息
ISBN:
(纸本)9781450391467
This paper presents an O (log log (d) over bar) round massively parallel algorithm for 1 +epsilon approximation of maximum weighted b-matchings, using near-linear memory per machine. Here (d) over bar denotes the average degree in the graph and epsilon is an arbitrarily small positive constant. Recall that b -matching is the natural and well-studied generalization of the matching problem where different vertices are allowed to have different numbers of incident edges in the matching. Concretely, each vertex v is given a positive integer budget b(v) and it can have up to b(v) incident edges in the matching. Previously, there were known algorithms with round complexity O (log log n), or O(log logn) where Delta denotes maximum degree, for 1 + epsilon approximation of weighted matching and for maximal matching [Czumaj et al., STOC'18, Ghaffari et al. PODC'18;Assadi et al. SODA'19;Behnezhad et al. FOCS'19;Gamlath et al. PODC'19]. But these algorithms do not extend to the more general b-matching problem.
Trusted execution environment (TEE) promises strong security guarantee with hardware extensions for security-sensitive tasks. Due to its numerous benefits, TEE has gained widespread adoption, and extended from CPU-onl...
详细信息
ISBN:
(纸本)9798350326598;9798350326581
Trusted execution environment (TEE) promises strong security guarantee with hardware extensions for security-sensitive tasks. Due to its numerous benefits, TEE has gained widespread adoption, and extended from CPU-only TEEs to FPGA and GPU TEE systems. However, existing TEE systems exhibit inadequate and inefficient support for an emerging (and significant) processing unit, NPU. For instance, commercial TEE systems resort to coarse-grained and static protection approaches for NPUs, resulting in notable performance degradation (10%-20%), limited (or no) multitasking capabilities, and suboptimal resource utilization. In this paper, we present a secure NPU architecture, known as sNPU, which aims to mitigate vulnerabilities inherent to the design of NPU architectures. First, sNPU proposes NPU Guarder to enhance the NPU's access control. Second, sNPU defines new attack surfaces leveraging in-NPU structures like scratchpad and NoC, and designs NPU Isolator to guarantee the isolation of scratchpad and NoC routing. Third, our system introduces a trusted software module called NPU Monitor to minimize the software TCB. Our prototype, evaluated on FPGA, demonstrates that sNPU significantly mitigates the runtime costs associated with security checking (from upto 20% to 0%) while incurring less than 1% resource costs.
暂无评论