Satisfiability Modulo Theories on arithmetic theories have significant applications in many important domains. Previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT sol...
详细信息
ISBN:
(纸本)9783031656262;9783031656279
Satisfiability Modulo Theories on arithmetic theories have significant applications in many important domains. Previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT solvers. With the development of computing resources, a promising direction to boost performance is parallel and even distributed SMT solving. We explore this potential in a divide-and-conquer view and propose a novel dynamic parallel framework with variable-level partitioning. To the best of our knowledge, this is the first attempt to perform variable-level partitioning for arithmetic theories. Moreover, we enhance the interval constraint propagation algorithm, coordinate it with Boolean propagation, and integrate it into our variable-level partitioning strategy. Our partitioning algorithm effectively capitalizes on propagation information, enabling efficient formula simplification and search space pruning. We apply our method to three state-of-the-art SMT solvers, namely CVC5, OpenSMT2, and Z3, resulting in efficient parallel SMT solvers. Experiments are carried out on benchmarks of linear and nonlinear arithmetic over both real and integer variables, and our variable-level partitioning method shows substantial improvements over previous partitioning strategies and is particularly good at non-linear theories.
The rise of the Internet of Things and Fog computing has increased substantially the number of interconnected devices at the edge of the network. As a result, a large amount of computations is now performed in the fog...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
The rise of the Internet of Things and Fog computing has increased substantially the number of interconnected devices at the edge of the network. As a result, a large amount of computations is now performed in the fog generating vast amounts of data. To process this data in near real time, stream processing is typically employed due to its efficiency in handling continuous streams of information in a scalable manner. However, most stream processing approaches do not consider the underlying network devices as candidate resources for processing data. Moreover, many existing works do not take into account the incurred network latency of performing computations on multiple devices in a distributed way. Consequently, the fog computing resources may not be fully exploited by existing stream processing approaches. To avoid this, we formulate an optimization problem for utilizing the existing fog resources, and we design heuristics for solving this problem efficiently. Furthermore, we integrate our heuristics into Apache Storm, and we perform experiments that show latency-related benefits compared to alternatives.
In this paper, we consider the efficient computation of all eigenvalues and eigenvectors of Symmetric Hierarchically Semiseparable (HSS) matrices, which have an inherent structure: the off-diagonal blocks have hierarc...
详细信息
ISBN:
(纸本)9798400717932
In this paper, we consider the efficient computation of all eigenvalues and eigenvectors of Symmetric Hierarchically Semiseparable (HSS) matrices, which have an inherent structure: the off-diagonal blocks have hierarchical bases and have low ranks. State-of-the-art is a divide-conquer algorithm, SuperDC, to compute eigenvectors and eigenvalues in an order of magnitude faster than popular and commercial solvers. We improve on the state-of-the-art and present novel shared- and distributed-memory parallel algorithms for computing eigenvalues of HSS matrices. We take advantage of the recursive divide-conquer approach employed in SuperDC to parallelize the eigenvalue computation, present a span and available parallelism analysis, and optimize the original SuperDC algorithm to reduce the storage requirement from O(N-2) to O(N) in the case of banded matrices. We do a systematic evaluation with different parallel programming paradigms, scheduling policies, and scalability configurations. We observe that in the shared-memory parallel implementations, OpenMP implementations perform better than Cilk versions, work stealing offers no significant performance advantage, and in the distributed-memory implementations, asynchronous communication yields better performance than implementation with barrier-based communication. We find the optimal input decomposition at which the parallel implementations provide the best speedup. For input symmetric matrices of different sparsity structures and sizes ranging from 4096 to 256k rows, on up to 512 cores, the implementations scale well and show a significant speedup of up to 147x compared to the available SuperDC implementation.
Edge computing is considered a promising architecture for handling latency-sensitive and computationally intensive tasks. The lack of consideration for the timing of jobs and their unique topology in the existing rese...
详细信息
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerati...
ISBN:
(纸本)9789819628636
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies;Diagnosability of the Lexicographic Product of Paths and Complete Bipartite Graphs Under PMC Model;DTuner: A Construction-Based Optimization Method for Dynamic Tensor Operators Accelerating;Efficient Implementation of the LOBPCG Algorithm on a CPU-GPU Cluster;HP-CSF: An GPU Optimization Method for CP Decomposition of Incomplete Tensors;JediGAN: A Fully Decentralized Training of GAN with Adaptive Discriminator Averaging and Generator Selection;optimizing Vo-Viso: A Modified Methodology to parallelcomputing with Isolating Data in Memristor Arrays;parallel Computation of the Combination of Two Point Operations in Conic Curves Cryptosystem over GF(2n) Using Tile Self-assembly;parallel Construction of Independent Spanning Trees on 3-ary n-cube Networks;SpecInF: Exploiting Idle GPU Resources in distributed DL Training via Speculative Inference Filling;swDarknet: A Heterogeneous parallel Deep Learning Framework Suitable for SW26010 Pro Processor;VConv: Autotiling Convolution Algorithm Based on MLIR for Multi-core Vector accelerators;ACH-Code: An Efficient Erasure Code to Reduce Average Repair Cost in Cloud Storage Systems of Multiple Availability Zones;CMS: A Computility Resource Status Management and Storage Framework;fast Memory Disaggregation with SwiftSwap;HASLB: Huge Page Allocation Strategy Optimized for Load-Balance in parallelcomputing Programs;lightFinder: Finding Persistent Items with Small Memory;miDedup: A Restore-Friendly Deduplication Method on Docker Image Storage Systems;SPLR: A Selective Packet Loss Recovery for Improved RDMA Performance;a Cluster-Based Platoon Formation Scheme for Realistic Automated Vehicle Platooning;AnaNET: Anatomical Network fo
Edge computing has transformed machine learning by using computing closer to the data sources, thereby reducing latency. The ever-increasing volume of data has necessitated forming clusters of edge devices, possibly w...
详细信息
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerati...
ISBN:
(纸本)9789819628292
The proceedings contain 76 papers. The special focus in this conference is on Network and parallelcomputing. The topics include: AsymFB: Accelerating LLM Training Through Asymmetric Model parallelism;DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies;Diagnosability of the Lexicographic Product of Paths and Complete Bipartite Graphs Under PMC Model;DTuner: A Construction-Based Optimization Method for Dynamic Tensor Operators Accelerating;Efficient Implementation of the LOBPCG Algorithm on a CPU-GPU Cluster;HP-CSF: An GPU Optimization Method for CP Decomposition of Incomplete Tensors;JediGAN: A Fully Decentralized Training of GAN with Adaptive Discriminator Averaging and Generator Selection;optimizing Vo-Viso: A Modified Methodology to parallelcomputing with Isolating Data in Memristor Arrays;parallel Computation of the Combination of Two Point Operations in Conic Curves Cryptosystem over GF(2n) Using Tile Self-assembly;parallel Construction of Independent Spanning Trees on 3-ary n-cube Networks;SpecInF: Exploiting Idle GPU Resources in distributed DL Training via Speculative Inference Filling;swDarknet: A Heterogeneous parallel Deep Learning Framework Suitable for SW26010 Pro Processor;VConv: Autotiling Convolution Algorithm Based on MLIR for Multi-core Vector accelerators;ACH-Code: An Efficient Erasure Code to Reduce Average Repair Cost in Cloud Storage Systems of Multiple Availability Zones;CMS: A Computility Resource Status Management and Storage Framework;fast Memory Disaggregation with SwiftSwap;HASLB: Huge Page Allocation Strategy Optimized for Load-Balance in parallelcomputing Programs;lightFinder: Finding Persistent Items with Small Memory;miDedup: A Restore-Friendly Deduplication Method on Docker Image Storage Systems;SPLR: A Selective Packet Loss Recovery for Improved RDMA Performance;a Cluster-Based Platoon Formation Scheme for Realistic Automated Vehicle Platooning;AnaNET: Anatomical Network fo
Matrix multiplication is crucial in scientific computing, but it demands substantial resources. We propose a framework for effectively utilizing heterogeneous GPUs to large matrix multiplication. By splitting matrices...
详细信息
The proceedings contain 17 papers. The topics discussed include: portable implementations of work stealing;sKokkos: enabling Kokkos with transparent device selection on heterogeneous systems using OpenACC;parallelized...
ISBN:
(纸本)9798400708893
The proceedings contain 17 papers. The topics discussed include: portable implementations of work stealing;sKokkos: enabling Kokkos with transparent device selection on heterogeneous systems using OpenACC;parallelized remapping algorithms for km-scale global weather and climate simulations with icosahedral grid system;approximate block diagonalization of symmetric matrices using quantum annealing;QUBO formulation using inequalities for problems with complex constraints;low-latency communication in RISC-V clusters;flexible systolic array platform on virtual 2-D multi-FPGA plane;an efficient task-parallel pipeline programming framework;and task-based low-rank hybrid parallel Cholesky factorization for distributed memory environment.
Quantum annealers like those from D-Wave Systems implement adiabatic quantum computing to solve optimization problems, but their analog nature and limited control functionalities present challenges to correcting or mi...
详细信息
ISBN:
(纸本)9798400705977
Quantum annealers like those from D-Wave Systems implement adiabatic quantum computing to solve optimization problems, but their analog nature and limited control functionalities present challenges to correcting or mitigating errors. As quantum computingadvances towards applications, effective error suppression is an important research goal. We propose a new approach called replication based mitigation (RBM) based on parallel quantum annealing. In RBM, physical qubits representing the same logical qubit are dispersed across different copies of the problem embedded in the hardware. This mitigates hardware biases, is compatible with limited qubit connectivity in current annealers, and is suited for available noisy intermediate-scale quantum (NISQ) annealers. Our experimental analysis shows that RBM provides solution quality on par with previous methods while being compatible with a much wider range of hardware connectivity patterns. In comparisons against standard quantum annealing without error mitigation, RBM consistently improves the energies and ground state probabilities across parameterized problem sets.
暂无评论