Recently, many deep learning models have been trained in geographically distributed data centers. The carbon emissions produced by training the models may pose a significant threat to climate change like increasing te...
详细信息
ISBN:
(纸本)9798350368543;9798350368536
Recently, many deep learning models have been trained in geographically distributed data centers. The carbon emissions produced by training the models may pose a significant threat to climate change like increasing temperatures. Existing studies have a hardship in shifting the workload of training models to a data center with low carbon emissions. So, they fail to ensure low emissions of the workload during training, especially when long-term workloads like Large Language Models (LLMs) are trained. To cope with this problem, we propose a method that shifts the workload to a cloud with low carbon emissions while enduring a lack of computational resources. Specifically, we define a task scheduler that includes states and their transitions to migrate mini-batches dynamically. Next, we present a faulttolerant control that optimizes a GPU frequency to adapt to workload variations of training models while guaranteeing its power consumption. Last, we conducted exhaustive experiments using real-world data in terms of carbon emissions, transfer time, and power consumption compared to state-of-the-art methods.
作者:
Hu, ZheyuanNiu, JianweiRen, TaoBeihang Univ
Sch Comp Sci & Engn State Key Lab Virtual Real Technol & Syst Beijing Peoples R China Chinese Acad Sci
Lab Internet Software Technol Inst Software Beijing Peoples R China Zhengzhou Univ
Res Inst Ind Technol Sch Informat Engn Zhengzhou Peoples R China
Mobile edge computing (MEC) has been proposed as a promising paradigm to provide mobile devices with both satisfactory computing capacity and task latency. One key issue in MEC is computation offloading (CompOff), whi...
详细信息
ISBN:
(纸本)9798350339864
Mobile edge computing (MEC) has been proposed as a promising paradigm to provide mobile devices with both satisfactory computing capacity and task latency. One key issue in MEC is computation offloading (CompOff), which has attracted numerous research interests. Most existing CompOff approaches are developed based on iterative programming (IterProg), that calculates a CompOff action based on system dynamics each time mobile tasks arrive. Due to the heavy dependency of IterProg on reliable system dynamics, as well as the online computational burden, recent years have seen a popular trend to develop CompOff approaches based on deep reinforcement learning (DRL), which could generate real-time model-free CompOff actions. However, due to the intrinsic poor generalization of DRL, it is hard to directly apply DRL-based policies in new MEC environments, and long-time fine-tuning is often required. To address the challenge, this paper proposes a fast transferable CompOff framework (named TransOff), based on the idea of embedded reinforcement learning. Specifically, TransOff is composed of multiple primitive CompOff policies (pCOPs) and a multiplicative composition function (MCF). The pCOPs and MCF are pre-trained in a diverse variety of MEC environments. When encountering new MEC environments, pCOPs are kept fixed to prevent catastrophic forgetting of pre-trained CompOff skills, while only MCF is fine-tuned to produce new compositions of pCOPs to achieve fast transfer. We conduct extensive experiments via both numerical simulation and real testbed, indicating the fast transfer ability of TransOff compared to the state-of-the-art DRL-based and meta learning-based CompOff approaches.
In this paper, we investigate experimentally the use of auctioning as a method for optimizing task orchestration in distributedcomputingsystems by making selfish agents compete to execute computational tasks. Our go...
详细信息
ISBN:
(纸本)9798350304831
In this paper, we investigate experimentally the use of auctioning as a method for optimizing task orchestration in distributedcomputingsystems by making selfish agents compete to execute computational tasks. Our goal is to find an approach that can improve the performance of these systems, using a deadline, fines, and reward limits in a reverse second-price sealed bid auction, to incentive and control the system, specifically in terms of improving task throughput and power consumption. With improvements to both energy consumption and task throughput, we have developed a promising approach, that is able to scale with the number of machines in the system. Results suggest that this type of auction may be useful for improving the implementation of these systems in a wide range of scenarios.
The 21st century has brought widespread adoption of the cloud computing paradigm when constructing large-scale software systems. While the approach has enabled systems to exhibit software quality attributes such as hi...
详细信息
ISBN:
(纸本)9798350358810;9798350358803
The 21st century has brought widespread adoption of the cloud computing paradigm when constructing large-scale software systems. While the approach has enabled systems to exhibit software quality attributes such as high agility and scalability, it has also magnified long-standing issues related to maintaining both software development standards and their accompanying software implementations. The research aims to improve or maintain the quality of consumer-driven software as cloud computing adoption becomes more prevalent. Our model-driven approach intends to offer solutions to fix the typical disconnect between design and implementation. The paper demonstrates the construction of a cloud-based healthcare application using a modeling and simulation language called Colored Petri Nets (CPNs). It demonstrates design verification techniques during a model simulation and software prototyping in the context of a cloud-based healthcare application. The healthcare domain serves as an ideal candidate since the domain usually lags behind other industries in updating software development approaches and tools. Furthermore, a cloud-first approach allows for infrastructure that supports millions of patients and physicians distributed globally. The paper aims to promote the technical merits of model-driven design in maintaining high assurance of software quality standards, especially in the context of cloud-based applications. Key contributions of the paper are adoption of model-driven system development in the context of cloud-based applications and integration of machine-learning models into simulation models for such purposes.
The distributed optimization problem is examined in this study while heavy-tailed gradient noises are present. A heavy-tailed noise xi indicates that the noise may not always satisfy the bounded variance requirement. ...
详细信息
ISBN:
(纸本)9798350364200;9798350364194
The distributed optimization problem is examined in this study while heavy-tailed gradient noises are present. A heavy-tailed noise xi indicates that the noise may not always satisfy the bounded variance requirement. As an alternative, it meets a broader assumption. The Pareto distribution noise is a common example of this type of noise. A number of distributed optimization techniques have been developed for the heavy-tailed noise situation;however, these algorithms require a centralized server within the network to gather client data. A distributed method using gradient clipping is proposed;unlike these algorithms, it takes into account the absence of a centralized server.
The pioneering research on message passing for Java (MPJ) which started after 1995 provided a crucially important framework and programming environment for parallel and distributedcomputing with Java. This framework ...
详细信息
ISBN:
(纸本)9798350340754
The pioneering research on message passing for Java (MPJ) which started after 1995 provided a crucially important framework and programming environment for parallel and distributedcomputing with Java. This framework resulted in an industry standard specification and a novel MPJ-based hierarchical development methodology for a new generation of large-scale distributedsystems. The invention of a novel component-based model and methodology for rapid distributed software development and execution based on the MPJ work and achievements are the core contributions presented in this paper. Based on the high-performance Java component-based model, concepts and research results, grid, cloud, and extreme-scale computing represent a fundamental shift in the delivery of information technology services that has permanently changed the computing landscape.
Volunteer computing (VC) is a type of network computing which exploits idle computing resources provided by vast amount of users on the Internet. In our previous work, we have implemented a prototype system of paralle...
详细信息
ISBN:
(纸本)9798350386851;9798350386844
Volunteer computing (VC) is a type of network computing which exploits idle computing resources provided by vast amount of users on the Internet. In our previous work, we have implemented a prototype system of parallel VC with the approach of server assisted communication and demonstrated the feasibility of parallel computing on a VC environment, not distributedcomputing as in the current VC. In this paper, we evaluate the effectiveness of parallel VC. The nature of nodes' volatility and unreliability in VC makes parallel computing not always effective. Through the simulations using open trace data of nodes' behavior, we reveal parallelism and redundancy of job execution which minimize execution time while maintaining high degree of reliability.
Modern software systems in every application domain are increasingly built as distributedsystems. Business applications are structured as cooperating microservices, IoT devices communicate with cloud-based services o...
详细信息
ISBN:
(纸本)9781665464598
Modern software systems in every application domain are increasingly built as distributedsystems. Business applications are structured as cooperating microservices, IoT devices communicate with cloud-based services over a network, and Web sites store data in globally dispersed data centers to support fast access in to localitiesin which their users reside. Behind all these systems lurk distributedcomputing infrastructures that architects and engineers must exploit to satisfy application service level agreements. To be successful, it is essential that architects understand the inherent complexity of distributedsystems. In this half day tutorial, I'll guide the attendees through the fundamental characteristics that distributedsystems exhibit. Each characteristic will be related to the software architecture quality attributes that they directly impact. The topics covered include communications reliability and latencies, message delivery semantics, state management, idempotence, data safety, consistency, time, distributed consensus, cascading failures and failover and recovery. I'll introduce each concept using an example distributed system and multiple 'props' to illustrate concepts. Once I've explained a concept using the example, Ill move on to show how the concept manifests itself in a software system and its effects on quality attributes requirements and inherent trade-offs. The tutorial will be suitable for graduate students, engineers and architects who have no or minimal exposure to distributedsystems concepts. The presentation format will be suitable for a mix of both in person and remote participants. It will combine interactive sessions with short technical explanations and examples to illustrate each distributedsystems concept.
In this panel contribution, I will discuss my vision on the need of developing new managing technologies to harness distributed "computing continuum" systems. These systems are concurrently executed in multi...
详细信息
ISBN:
(数字)9781665481311
ISBN:
(纸本)9781665481465
In this panel contribution, I will discuss my vision on the need of developing new managing technologies to harness distributed "computing continuum" systems. These systems are concurrently executed in multiple computing tiers: Cloud, Fog, Edge and IoT. This simple idea develops manifold challenges due to the inherent complexity inherited from the underlying infrastructures of these systems. This makes inappropriate the use of current methodologies for managing Internet distributedsystems, which are based on the early systems that were based on client/server architectures and were completely specified by the application software.
Fog computing, an extension of cloud computing, enhances capabilities by processing data closer to the source, thereby addressing latency and bandwidth issues inherent in traditional cloud models. However, the integra...
详细信息
Fog computing, an extension of cloud computing, enhances capabilities by processing data closer to the source, thereby addressing latency and bandwidth issues inherent in traditional cloud models. However, the integration of Artificial Intelligence (AI) into fog computing introduces challenges, particularly in resource management, security, and privacy. This paper systematically reviews AI applications within fog computing environments, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology to ensure rigorous analysis. The studies were selected based on predefined inclusion criteria, including research published between 2010 and 2024 in peer-reviewed journals and conference papers, with searches conducted in databases like ieee Xplore, ACM Digital Library, SpringerLink, and Scopus. The review identifies critical issues such as resource constraints, transparency in AI-driven security systems, and the need for adaptable AI models to address evolving security threats. In response, innovative solutions such as lightweight AI models (e.g., Pruned Neural Networks, Quantized Models, Knowledge Distillation), Explainable AI (XAI) (e.g., Model-Agnostic Methods, Feature Importance Analysis, Rule-Based Approaches), and federated learning are proposed. Additionally, a novel taxonomy is introduced, categorizing AI techniques into resource management, security enhancement, and privacy-preserving methods, offering a structured framework for researchers and practitioners. The paper concludes that effective AI integration in fog computing is essential for developing secure, efficient, and adaptable distributedsystems, with significant implications for both academia and industry.
暂无评论