With the rapid development of computer technology, all kinds of computer software are widely used in all walks of life. However, in the process of software development and maintenance, software defects are inevitable....
详细信息
We build a message-passing realization of the QR factorization for tall-and-skinny matrices on top of highly parallel linear algebra kernels, such as various types of matrix multiplications and triangular system solve...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
We build a message-passing realization of the QR factorization for tall-and-skinny matrices on top of highly parallel linear algebra kernels, such as various types of matrix multiplications and triangular system solves, plus a few small Cholesky decompositions. Our solution, using either the NVIDIA Collective Communications Library (NCCL) or a plain instance of MPI as the message-passing layer, and the implementation of these kernels in linear algebra libraries, can run both on clusters of multicore nodes, possibly accelerated with GPUs, or on multi-GPU platforms. The experimental evaluation of our parallel algorithm for the QR factorization on a cluster of 8 nodes with NVIDIA A100 boards shows significant acceleration factors over a code from MAGMA, based on House-lholder reflectors, that provides the same functionality. In addition, the experiments show a fair weak scalability when the problem has many more rows than columns.
We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduc...
ISBN:
(纸本)9798350307184
We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduct bundle adjustment in the submaps. In order to fit the parallel framework, they use approximate solutions instead of the LM algorithm. However, those methods often give sub-optimal results. Different from them, we utilize the exact LM algorithm to conduct global bundle adjustment where the formation of the reduced camera system (RCS) is actually parallelized and executed in a distributed way. To store the large RCS, we compress it with a block-based sparse matrix compression format (BSMC), which fully exploits its block feature. The BSMC format also enables the distributed storage and updating of the global RCS. The proposed method is extensively evaluated and compared with the state-of-theart pipelines using both synthetic and real datasets. Preliminary results demonstrate the efficient memory usage and vast scalability of the proposed method compared with the baselines. For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images (about 500 times that of the state-of-the-art LM-based BA) on a distributedcomputing system.
Since modern high performance computing systems are evolving towards diverse and heterogeneous architectures, the emergence of high-level portable programming models leads to a particular focus on performance portabil...
详细信息
The proceedings contain 89 papers. The topics discussed include: analysis of large capacity reversible data hiding for ECG using PEE and regression;a comparative study of tapering methods and transfer characteristics ...
ISBN:
(纸本)9798350357905
The proceedings contain 89 papers. The topics discussed include: analysis of large capacity reversible data hiding for ECG using PEE and regression;a comparative study of tapering methods and transfer characteristics in Fabry-Perot interferometric applications;a critical review on control techniques for parallel operated inverters in grid connected and standalone model;a review of battery energy storage system optimization: current state-of-the-art and future trends;advancements for improved plant disease and pest identification: a survey;an adaptive local measurement-based fault detection method proposed for offgrid small-scaled low-voltage DC microgrids;analysis of power link budget and interference of high altitude platform station technology in Nusa Tenggara Timur;archery bow micro-movement profiling using inertial measurement unit;and bacterial foraging optimization based least square support vector machine for short-term electricity load forecasting.
LB is simply the methodical distribution of load among many servers. To expedite the handling of customer requests, the fog server manages the substantial data on the cloud server. Data requirements are increasing, an...
详细信息
A high proportion of power electronic equipment, distributed energy, and other new energy access to the grid system, low inertia and low damping to the power grid system, these kinds of power grid characteristics in p...
详细信息
With the rapid development of the new generation of information technologies, including cloud computing, 5G, and the Internet of Things, data outsourcing storage in the cloud has brought great convenience to data stor...
详细信息
The proceedings contain 49 papers. The topics discussed include: performance analysis and benchmarking of a temperature downscaling deep learning model;an auto-tuning method for high-bandwidth low-latency approximate ...
ISBN:
(纸本)9798350337631
The proceedings contain 49 papers. The topics discussed include: performance analysis and benchmarking of a temperature downscaling deep learning model;an auto-tuning method for high-bandwidth low-latency approximate interconnection networks;a highly scalable high-performance Lagrangian transport and diffusion model for marine pollutants assessment;summarizing task-based applications behavior over many nodes through progression clustering;revisiting self-adaptation for efficient decision-making at run-time in parallel executions;priority-aware inter-server receive side scaling;AMG preconditioners based on parallel hybrid coarsening and multi-objective graph matching;dynamic resource partitioning for multi-tenant systolic array based DNN accelerator;improving inference time in multi-TPU systems with profiled model segmentation;a tamper-resistant storage framework for smart grid security;and content-aware auto-scaling of stream processing applications on container orchestration platforms.
Support for an ever-expanding range of distributed and parallel strategies and support for performance analysis are constantly evolving. We propose message passing interface (MPI) and open multi-processing (OpenMP) st...
详细信息
暂无评论