In distributed machine learning, Federated Learning (FL) has become a game-changing paradigm that allows for cooperative model training while protecting data privacy and taking regulatory compliance into account and w...
详细信息
Amidst global warming and escalating extreme weather events, indoor environmental quality's impact on human health and public hygiene gains prominence. Environmental parameters exist essentially as fields, which a...
详细信息
The development of the Internet of Things has increased the demand for real-time communication and computing in user equipments (UEs), which is a challenge for the limited battery capacity and computing power of UEs. ...
详细信息
Accurately calculating the electronic structure of strongly correlated chemical systems necessitates a detailed description of both static and dynamical electron correlations, posing a significant challenge in ab init...
详细信息
ISBN:
(纸本)9798400717932
Accurately calculating the electronic structure of strongly correlated chemical systems necessitates a detailed description of both static and dynamical electron correlations, posing a significant challenge in ab initio quantum chemistry. Although the high memory and computational demands generally limit these calculations to relatively modest systems, the advanced computational capabilities of modern GPUs provide new avenues to expand these limits. However, complex control flows inherent to computation notably impair performance on GPUs. Furthermore, the significant disparity in computational load across different branches leads to load imbalance, challenging the large-scale simulations. In this work, we introduce PASCI, a heterogeneous parallelcomputing framework designed to quickly and efficiently parallelize the computation of dynamical correlation energy based on determinants. The features of the PASCI framework include (1) a divergence-avoiding GPU algorithm, (2) a three-level load-mapping strategy to ensure load balance across processors, GPU warps, and GPU threads, (3) performance models for memory footprint and computation, and (4) seamless integration with existing quantum chemistry software. Experimental results using an NVIDIA A100 GPU demonstrate that our new GPU algorithm achieves an average 6.6x (up to 13.8x) peak performance increase and 2-4 orders of magnitude speedup in practical usage compared to its original GPU implementation. Moreover, PASCI exhibits excellent scalability, highlighting its potential as a powerful high-performance computing tool in complex quantum chemistry research.
Despite the development of various distributed graph systems, little attention has been paid to the granularity of computation and communication, which can significantly impact overall efficiency. Moreover, users ofte...
详细信息
We present NADA, a Network Attached Deep learning Accelerator. It provides a flexible hardware/software framework for training deep neural networks on ethernet-based FPGA clusters. The NADA hardware framework instanti...
详细信息
ISBN:
(纸本)9783031661457;9783031661464
We present NADA, a Network Attached Deep learning Accelerator. It provides a flexible hardware/software framework for training deep neural networks on ethernet-based FPGA clusters. The NADA hardware framework instantiates a dedicated entity for each layer in a model. Features and gradients flow through these entities in a tightly pipelined manner. From a compact description of a model and target cluster, the NADA software framework generates specific configuration bitstreams for each particular FPGA in the cluster. We demonstrate the scalability and flexibility of our approach by mapping an example CNN onto a cluster consisting of three up to nine Intel Arria 10 FPGAs. To verify NADAs effectiveness for commonly used networks, we train MobileNetV2 on a six-node cluster. We address the inherent incompatibility of the tightly pipelined layer parallel approach with batch normalization by using online normalization instead.
Federated learning (FL) has been widely adopted as a privacy-preserving model training paradigm. However, traditional FL protocol heavily relies on data transmission between clients and servers across the wide-area ne...
详细信息
Training large-scale deep neural networks (DNNs) using a large number of parameters requires significant computational resources. Despite the rapid advancements in GPU technology, limited budgets have forced many inst...
详细信息
The manual deployment of applications distributed across the cloud, fog, and edge is error-prone and complex. TOSCA is a standard for modeling the deployment of cloud applications in a vendor-neutral and technology-in...
详细信息
ISBN:
(纸本)9798400702341
The manual deployment of applications distributed across the cloud, fog, and edge is error-prone and complex. TOSCA is a standard for modeling the deployment of cloud applications in a vendor-neutral and technology-independent manner that is also suitable for the fog and edge continuum. However, there exist various TOSCA orchestrators with different functionalities. Thus, selecting an appropriate TOSCA orchestrator requires technical expertise since all the available orchestrators must be analyzed regarding technical, functional, legal, and organizational requirements. In this paper, we tackle this issue and present a systematic technology review of TOSCA orchestrators. Our goal is to support project managers, developers, and researchers in selecting a suitable TOSCA orchestrator. For this, we select actively maintained general-purpose open-source TOSCA orchestrators. Moreover, we introduce the TOSCA Orchestrator Classification Framework and present a selection support system.
The virtual power plant is an important part of the new power system. Its IoT components need to collect and process energy equipment measurement data in real time, which inevitably uses stream computing. This paper i...
详细信息
暂无评论