Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large m...
详细信息
ISBN:
(纸本)9798350326598;9798350326581
Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14 similar to 32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding communication latency and other inherent at-scale inefficiencies, we introduce an agile performance modeling framework, MAD-Max. This framework is designed to optimize parallelization strategies and facilitate hardware-software co-design opportunities. Through the application of MAD-Max to a suite of real-world large-scale ML models on state-of-the-art GPU clusters, we showcase potential throughput enhancements of up to 2.24x for pre-training and up to 5.27x for inference scenarios, respectively.
Nowadays, smart mobile devices (SMDs) support various computation-intensive and delay-sensitive applications, e.g., online games, and figure compression. However, SMDs have limited computing resources and battery ener...
详细信息
This research examines 5G-enabled IoT devices' real-time data processing capability and constraints. A thorough literary study helps understand current technologies, application situations, and industrial difficul...
详细信息
The IoT is a relatively new and exciting field in computer science that is expanding rapidly worldwide. Background: IoT has seen substantial growth due to advancements in wireless networking, enhanced sensors, and IoT...
详细信息
As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to push performance within limited size, weight, power, and cost budgets. O...
详细信息
ISBN:
(纸本)9798350311990
As the computing landscape evolves, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to push performance within limited size, weight, power, and cost budgets. One such methodology is to build Domain-Specific System on Chips (DSSoCs) that promise increased productivity through narrowed scope of their target application domain. In previous works, we have proposed CEDR, an open source, unified compilation and runtime framework for DSSoC architectures that allows applications, scheduling heuristics, and accelerators to be co-designed in a cohesive manner that maximizes system performance. In this work, we present changes to the application development workflow that enable a more productive and expressive API-based programming methodology. These changes allow for more rapid integration of new applications without sacrificing application performance. Towards the design of heterogeneous SoCs with rich set of accelerators, in this study we experimentally study the impact of increase in workload complexity and growth in the pool of compute resources on execution time of dynamically arriving workloads composed of real-life applications executed over architectures emulated on Xilinx ZCU102 MPSoC and Nvidia Jetson AGX Xavier. We expand CEDR into the application domain of autonomous vehicles, and we find that API-based CEDR achieves a runtime overhead reduction of 19.5% with respect to the original CEDR.
Many parallel and distributed computing research results are obtained in simulation, using simulators that mimic real-world executions on some target system. Each such simulator is configured by picking values for par...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Many parallel and distributed computing research results are obtained in simulation, using simulators that mimic real-world executions on some target system. Each such simulator is configured by picking values for parameters that define the behavior of the underlying simulation models it implements. The main concern for a simulator is accuracy: simulated behaviors should be as close as possible to those observed in the real-world target system. This requires that values for each of the simulator's parameters be carefully picked, or "calibrated," based on ground truth real-world executions. Examining the current state of the art shows that simulator calibration, at least in the field of parallel and distributed computing, is often undocumented (and thus perhaps often not performed) and, when documented, is described as a labor-intensive, manual process. In this work we evaluate the benefit of automating simulation calibration using simple algorithms. Specifically, we use a real-world case study from the field of High Energy Physics and compare automated calibration to calibration performed by a domain scientist. Our main finding is that automated calibration is on par with or significantly outperforms the calibration performed by the domain scientist. Furthermore, automated calibration makes it straightforward to operate desirable trade-offs between simulation accuracy and simulation speed.
Command and Control (C2) agents are a critical component of many cyberattacks, enabling adversaries to maintain covert control over compromised systems. In recent years, attackers have increasingly leveraged real-worl...
详细信息
Music source separation aims to separate polyphonic music into different types of sources. Most existing methods focus on enhancing the quality of separated results by using a larger model structure, rendering them un...
详细信息
Weed management remains a critical challenge in agriculture, where weeds compete with crops for essential resources, leading to significant yield losses. Accurate detection of weeds at various growth stages is crucial...
详细信息
This research elucidates the development and optimization of advanced sensor network-based health monitoring systems. In response to the increasing need for accurate health data in real-time, advanced gadgets capable ...
详细信息
暂无评论