The proceedings contain 131 papers. The special focus in this conference is on Algorithms and Architectures for parallel Processing. The topics include: MARO: Enabling Full MPI Automatic Refactoring in DSL-Based ...
ISBN:
(纸本)9789819615506
The proceedings contain 131 papers. The special focus in this conference is on Algorithms and Architectures for parallel Processing. The topics include: MARO: Enabling Full MPI Automatic Refactoring in DSL-Based Programming Framework;SSC: An SRAM-Based Silence computing Design for On-chip Memory;TP-BFT: A Faster Asynchronous BFT Consensus with parallel Structure;LTP: A Lightweight On-Chip Temporary Prefetcher for Data-Dependent Memory Accesses;A Neural Network-Based PUF Protection Method Against Machine Learning Attack;Compression Format and Systolic Array Structure Co-design for Accelerating Sparse Matrix Multiplication in DNNs;multidimensional Intrinsic Identity Construction and Dynamic Seamless Authentication Schemes in IoT Environments;invisible Backdoor Attack with Image Contours Triggers;finestra: Multi-aggregator Swarm Learning for Gradient Leakage Defense;DIsFU: Protecting Innocent Clients in Federated Unlearning;multiple-Round Aggregation of Abstract Semantics for Secure Heterogeneous Federated Learning;dynamic Privacy Protection with Large Language Model in Social Networks;a Dynamic Symmetric Searchable Encryption Scheme for Rapid Conjunctive Queries;a Data Watermark Scheme Base on Data Converted Bitmap for Data Trading;distributed Incentive Algorithm for Fine-Grained Offloading in Vehicular Ad Hoc Networks;mitigating Over-Unlearning in Machine Unlearning with Synthetic Data Augmentation;AW-YOLOv9: Adverse Weather Conditions Adaptation for UAV Detection;efficient and Privacy-Preserving Ranking-Based Federated Learning;on-Chain Dynamic Policy Evaluation for Decentralized Access Control;DPG-FairFL: A Dual-Phase GAN-Based Defense Framework Against Image-Based Fairness Data Poisoning Attacks in Federated Learning.
The proceedings contain 131 papers. The special focus in this conference is on Algorithms and Architectures for parallel Processing. The topics include: MARO: Enabling Full MPI Automatic Refactoring in DSL-Based ...
ISBN:
(纸本)9789819615445
The proceedings contain 131 papers. The special focus in this conference is on Algorithms and Architectures for parallel Processing. The topics include: MARO: Enabling Full MPI Automatic Refactoring in DSL-Based Programming Framework;SSC: An SRAM-Based Silence computing Design for On-chip Memory;TP-BFT: A Faster Asynchronous BFT Consensus with parallel Structure;LTP: A Lightweight On-Chip Temporary Prefetcher for Data-Dependent Memory Accesses;A Neural Network-Based PUF Protection Method Against Machine Learning Attack;Compression Format and Systolic Array Structure Co-design for Accelerating Sparse Matrix Multiplication in DNNs;multidimensional Intrinsic Identity Construction and Dynamic Seamless Authentication Schemes in IoT Environments;invisible Backdoor Attack with Image Contours Triggers;finestra: Multi-aggregator Swarm Learning for Gradient Leakage Defense;DIsFU: Protecting Innocent Clients in Federated Unlearning;multiple-Round Aggregation of Abstract Semantics for Secure Heterogeneous Federated Learning;dynamic Privacy Protection with Large Language Model in Social Networks;a Dynamic Symmetric Searchable Encryption Scheme for Rapid Conjunctive Queries;a Data Watermark Scheme Base on Data Converted Bitmap for Data Trading;distributed Incentive Algorithm for Fine-Grained Offloading in Vehicular Ad Hoc Networks;mitigating Over-Unlearning in Machine Unlearning with Synthetic Data Augmentation;AW-YOLOv9: Adverse Weather Conditions Adaptation for UAV Detection;efficient and Privacy-Preserving Ranking-Based Federated Learning;on-Chain Dynamic Policy Evaluation for Decentralized Access Control;DPG-FairFL: A Dual-Phase GAN-Based Defense Framework Against Image-Based Fairness Data Poisoning Attacks in Federated Learning.
The rapid growth of cloud computing has brought new challenges in parallel Batch Machine Scheduling (PBMS), particularly when incorporating malleability and rejection constraints. This has led to the parallel Batch Ma...
详细信息
Deep Learning (DL), especially with Large Language Models (LLMs), brings benefits to various areas. However, DL training systems usually yield prominent idling GPU resources due to many factors, such as resource alloc...
详细信息
Edge computing is a rapidly developing research area known for its ability to reduce latency and improve energy efficiency, and it also has a potential for green computing. Many geographically distributed edge servers...
详细信息
Since the last decade, radio astronomy has started a new era: the advent of the Square Kilometer Array (SKA), preceded by its pathfinders, will produce a huge amount of data that will be hard to process with a traditi...
详细信息
ISBN:
(纸本)9798331524937
Since the last decade, radio astronomy has started a new era: the advent of the Square Kilometer Array (SKA), preceded by its pathfinders, will produce a huge amount of data that will be hard to process with a traditional approach. This means that the current state-of-the-art software for data reduction and imaging will have to be re-modeled to face such data challenge. In order to manage such an increase in data size and computational requirements, scientists need to exploit modern high-performance computing (HPC) architectures. In particular, heterogeneous systems, based on complex combinations of CPUs, accelerators, high-speed networks and composite storage devices need to be used in an efficient and effective way. In this paper, we present an overview on Radio Imaging Code Kernels (RICK;[1];[2];[3]), a code able to perform the most computationally demanding steps of w-stacking gridder algorithm exploiting distributedparallelism and GPU acceleration. GPU offloading is possible through CUDA, HIP, and OpenMP, aiming at the largest possible usability among multiple architectures. After detailing the (multi-)GPU approach to the problem and listing all the new code implementations, we analyze its performances considering both the computational and communication workload. We will show how the full, distributed GPU offload of the code, first of its kind and crucial to deal with increasingly large interferometric data, represents not only an extremely fast and optimized approach, but also the greenest one if compared to its parallel CPU counterpart. This code, now publicly available, has been tested with a wide variety of modern interferometers and SKA pathfinders. This represents, to date, the first example of radio imaging software fully enabled to GPUs, becoming a potential state-of-the-art approach for the upcoming SKA. Finally, we will also present the future perspectives about the code, planned to be converted into a library and possibly be used by any of the most
Federated Learning (FL) is vulnerable to backdoor attacks through data poisoning if the data is not scrutinized, as malicious participants can inject backdoor triggers in normal samples, leading to poisoned updates. D...
详细信息
distributed deep neural network training necessitates efficient GPU collective communications, which are inherently susceptible to deadlocks. GPU collective deadlocks arise easily in distributed deep learning applicat...
详细信息
With advancements in 3D reconstruction and computer graphics, Neural Radiance Fields (NeRF) has emerged as a powerful technique in novel view synthesis, holding potential for immersive extended reality (XR) and gaming...
详细信息
Transformer-based large language models (LLMs) have become a dominant force in natural language processing, advancing both research and industry. As model sizes have grown from billions to hundreds of billions of para...
详细信息
暂无评论