Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large m...
详细信息
ISBN:
(纸本)9798350326598;9798350326581
Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14 similar to 32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding communication latency and other inherent at-scale inefficiencies, we introduce an agile performance modeling framework, MAD-Max. This framework is designed to optimize parallelization strategies and facilitate hardware-software co-design opportunities. Through the application of MAD-Max to a suite of real-world large-scale ML models on state-of-the-art GPU clusters, we showcase potential throughput enhancements of up to 2.24x for pre-training and up to 5.27x for inference scenarios, respectively.
The proceedings contain 8 papers. The special focus in this conference is on distributedcomputer and Communication networks. The topics include: Transient Behavior of the Photonic Switch with Duplicati...
ISBN:
(纸本)9783031618345
The proceedings contain 8 papers. The special focus in this conference is on distributedcomputer and Communication networks. The topics include: Transient Behavior of the Photonic Switch with Duplication of Switching Elements in the All-Optical Network with Heterogeneous Traffic;examining the Performance of a distributed System Through the Application of Queuing Theory;constructive Approach to Multi-position Passive Acoustic Localization in Information-Measurement systems;broadband Wireless networks Based on Tethered High-Altitude Unmanned Platforms;FPGA Implementation of a Decoder with Low-Density Parity Checks Based on the Minimum Sum Algorithm for 5G networks;Selecting the Performance Metrics to Control the CPU Oversubscription Ratio in a Cloud Server.
Workflow systems provide a convenient way for users to write large-scale applications by composing independent tasks into large graphs that can be executed concurrently on high-performance clusters. In many newer work...
详细信息
ISBN:
(纸本)9798400704130
Workflow systems provide a convenient way for users to write large-scale applications by composing independent tasks into large graphs that can be executed concurrently on high-performance clusters. In many newer workflow systems, tasks are often expressed as a combination of function invocations in a high-level language. Because necessary code and data are not statically known prior to execution, they must be moved into the cluster at runtime. An obvious way of doing this is to translate function invocations into self-contained executable programs and run them as usual, but this brings a hefty performance penalty: a function invocation now needs to piggyback its context with extra code and data to a remote node, and the remote node needs to take extra time to reconstruct the invocation's context before executing it, both detrimental to lightweight short-running functions. A better solution for workflow systems is to treat functions and invocations as first-class abstractions: subsequent invocations of the same function on a worker node should only pay for the cost of context setup once and reuse the context between different invocations. The remaining problems lie in discovering, distributing, and retaining the reusable context among workers. In this paper, we discuss the rationale and design requirement of these mechanisms to support context reuse, and implement them in TaskVine, a data-intensive distributed framework and execution engine. Our results from executing a large-scale neural network inference application and a molecular design application show that treating functions and invocations as first-class abstractions reduces the execution time of the applications by 94.5% and 26.9%, respectively.
This paper proposes a novel multi-agent framework for penetration testing that aims to enable efficient and adaptive collaboration of specialised agents. This framework uses the Blackboard system for communication bet...
详细信息
Secure Socket Shell, also known as Secure Shell, refers to the cryptographic network protocol and suite of implementation utilities that helps users connect a computer over an unsecured network. Although SSH provides ...
详细信息
The proceedings contain 32 papers. The topics discussed include: empirical study on request timeout and retry for microservices communication;rl-based approach to enhance reliability and efficiency in autoscaling for ...
ISBN:
(纸本)9798331540746
The proceedings contain 32 papers. The topics discussed include: empirical study on request timeout and retry for microservices communication;rl-based approach to enhance reliability and efficiency in autoscaling for heterogeneous edge serverless computing environments;robustness of redundancy-hardened convolutional neural networks against adversarial attacks;bridging gaps between scenario-based safety analysis and simulation-based testing for autonomous driving systems;sequential programming for distributed algorithm verification;selecting nodes to protect in interdependent networks using Shapley value analysis;smart building control system emulation platform for security testing;and construction of VDM++ specifications from extended screen transition diagrams for validation of microservice-based web applications.
Spiking neural networks (SNNs) have gained attention as a promising alternative to traditional artificial neural networks (ANNs) due to their potential for energy efficiency and their ability to model spiking behavior...
详细信息
ISBN:
(纸本)9798350311990
Spiking neural networks (SNNs) have gained attention as a promising alternative to traditional artificial neural networks (ANNs) due to their potential for energy efficiency and their ability to model spiking behavior in biological systems. However, the training of SNNs is still a challenging problem, and new techniques are needed to improve their performance. In this paper, we study the impact of skip connections on SNNs and propose a hyperparameter optimization technique that adapts models from ANN to SNN. We demonstrate that optimizing the position, type, and number of skip connections can significantly improve the accuracy and efficiency of SNNs by enabling faster convergence and increasing information flow through the network. Our results show an average +8% accuracy increase on CIFAR-10-DVS and DVS128 Gesture datasets adaptation of multiple state-of-the-art models.
Autonomous vehicles (AVs) increasingly rely on vehicle-to-everything (V2X) networks for communication. However, due to the devices' heterogeneity, they are more susceptible to attacks like distributed denial of se...
详细信息
Early Exit Neural networks (EENNs) achieve enhanced efficiency compared to traditional models, but creating them is challenging due to the many additional design choices required. To address this, we propose an automa...
详细信息
ISBN:
(纸本)9783031783791;9783031783807
Early Exit Neural networks (EENNs) achieve enhanced efficiency compared to traditional models, but creating them is challenging due to the many additional design choices required. To address this, we propose an automated augmentation flow that converts existing models into EENNs, making all necessary design decisions for deployment on heterogeneous or distributed embedded targets. Our framework is the first to perform all these steps, including EENN architecture construction, subgraph mapping, and decision mechanism configuration. We evaluated our approach on embedded Deep Learning scenarios, achieving significant performance improvements. Our solution reduced latency by 65.95% on a speech command detection problem and mean operations per inference by 78.3% on an ECG classification task. This showcases the potential for EENNs in embedded applications.
SCALO is the first distributed brain-computer interface (BCI) consisting of multiple wireless-networked implants placed on different brain regions. SCALO unlocks new treatment options for debilitating neurological dis...
详细信息
ISBN:
(纸本)9798400700958
SCALO is the first distributed brain-computer interface (BCI) consisting of multiple wireless-networked implants placed on different brain regions. SCALO unlocks new treatment options for debilitating neurological disorders and new research into brain-wide network behavior. Achieving the fast and low-power communication necessary for real-time processing has historically restricted BCIs to single brain sites. SCALO also adheres to tight power constraints, but enables fast distributed processing. Central to SCALO's efficiency is its realization as a full stack distributed system of brain implants with accelerator-rich compute. SCALO balances modular system layering with aggressive cross-layer hardware-software co-design to integrate compute, networking, and storage. The result is a lesson in designing energy-efficient networked distributedsystems with hardware accelerators from the ground up.
暂无评论