Nowadays, performance in HPC applications focuses on MPI efficiency as the de facto message-passing library to exploit parallelism. Features such as multithread and communication and processing overlap are continuousl...
Nowadays, performance in HPC applications focuses on MPI efficiency as the de facto message-passing library to exploit parallelism. Features such as multithread and communication and processing overlap are continuously studied to adapt to new platforms and a more significant number of processing units like GPU platforms. In this sense, recently, the MPI-4.0 standard introduced the partitioned point-to-point communication primitives to potentiate computation and communication overlapping. This paper introduces an innovative extension to MPI, specifically addressing partitioned communication for MPI-reduction primitives. Traditional reduction tasks conventionally involve processing the complete input vector following the conclusion of GPU computations. In contrast, our proposed methodology exploits message partitioning to process reduction tasks in real-time incrementally. This approach allows the system to process individual partitions of the input vector as they become available, removing the necessity to await the full completion of GPU computations before initiating the reduction. Our results demonstrate promising benefits, particularly for large message sizes. However, it is essential to acknowledge that optimizations at synchronization points remain potential bottlenecks, requiring meticulous analysis and consideration.
Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordanc...
详细信息
Prototype-based clustering algorithms have garnered considerable attention in the field of machine learning due to their efficiency and interpretability. Nonetheless, these algorithms often face performance degradatio...
详细信息
Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by...
详细信息
In this paper we consider Bayesian parameter inference associated to a class of partially observed stochastic differential equations (SDE) driven by jump processes. Such type of models can be routinely found in applic...
详细信息
Wireless sensor network (WSN) attacks seek to disrupt or eliminate the network's ability to execute its expected duties. Penetration testing is a wireless sensor network defence that detects unknown threats. Becau...
详细信息
We study global quadratic Ⅎ2 performance in probability (quadratic stability and Ⅎ2 gain γ in probability: abbreviated as GQℲ2(γ)-P) for switched systems which are composed of a finite set of linear stochastic subsy...
详细信息
Generative Adversarial Networks (GANs) have revolutionized image synthesis by using two neural networks, a generator and a discriminator, to create realistic images from random noise. In this adversarial process, the ...
详细信息
ISBN:
(数字)9798331504960
ISBN:
(纸本)9798331504977
Generative Adversarial Networks (GANs) have revolutionized image synthesis by using two neural networks, a generator and a discriminator, to create realistic images from random noise. In this adversarial process, the generator attempts to fool the discriminator, which distinguishes between real and fake images. Advanced variants like conditional GANs and CycleGANs enable tasks like specified image generation and style transfer. Our paper presents improvements in image quality and training stability for GANs by introducing new training methods and loss function modifications to address issues like mode collapse. Evaluated on CelebA and CIFAR-10, our model outperforms previous GANs with Inception Scores of 9.69 and 10.79 and Frechet Inception Distances of 7.91 and -9.69, respectively. These results demonstrate better convergence, more stable training, and higher-quality image generation, establishing a new benchmark for GAN research and applications.
Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. Howeve...
详细信息
Hyperspectral image (HSI) restoration aims at recov-ering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the co...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Hyperspectral image (HSI) restoration aims at recov-ering clean images from degraded observations and plays a vital role in downstream tasks. Existing model-based methods have limitations in accurately modeling the com-plex image characteristics with handcraft priors, and deep learning-based methods suffer from poor generalization ability. To alleviate these issues, this paper proposes an unsupervised HSI restoration framework with pre-trained diffusion model (HIR-Diff), which restores the clean HSls from the product of two low-rank components, i.e., the re-duced image and the coefficient matrix. Specifically, the re-duced image, which has a low spectral dimension, lies in the image field and can be inferred from our improved diffusion model where a new guidance function with total variation (TV) prior is designed to ensure that the reduced image can be well sampled. The coefficient matrix can be effectively pre-estimated based on singular value decomposition (SVD) and rank-revealing QR (RRQR) factorization. Fur-thermore, a novel exponential noise schedule is proposed to accelerate the restoration process (about 5 x acceleration for denoising) with little performance decrease. Ex-tensive experimental results validate the superiority of our method in both performance and speed on a variety of HSI restoration tasks, including HSI denoising, noisy HSI super-resolution, and noisy HSI inpainting. The code is available at https://***/LiPang/HIRDiff.
暂无评论