The surge in demand for energy-efficient computing has spurred the exploration of cutting-edge techniques to optimize power consumption in modern computingsystems. Though the traditional implementation of Dynamic Vol...
详细信息
Deep neural networks (DNNs) are increasingly popular owing to their ability to solve complex problems such as image recognition, autonomous driving, and natural language processing. Their growing complexity coupled wi...
详细信息
ISBN:
(纸本)9798350339864
Deep neural networks (DNNs) are increasingly popular owing to their ability to solve complex problems such as image recognition, autonomous driving, and natural language processing. Their growing complexity coupled with the use of larger volumes of training data (to achieve acceptable accuracy) has warranted the use of GPUs and other accelerators. Such accelerators are typically expensive, with users having to pay a high upfront cost to acquire them. For infrequent use, users can, instead, leverage the public cloud to mitigate the high acquisition cost. However, with the wide diversity of hardware instances (particularly GPU instances) available in public cloud, it becomes challenging for a user to make an appropriate choice from a cost/performance standpoint. In this work, we try to address this problem by (i) introducing a comprehensive distributed deep learning (DDL) profiler Stash, which determines the various execution stalls that DDL suffers from, and (ii) using Stash to extensively characterize various public cloud GPU instances by running popular DNN models on them. Specifically, it estimates two types of communication stalls, namely, interconnect and network stalls, that play a dominant role in DDL execution time. Stash is implemented on top of prior work, DS-analyzer, that computes only the CPU and disk stalls. Using our detailed stall characterization, we list the advantages and shortcomings of public cloud GPU instances for users to help them make an informed decision(s). Our characterization results indicate that the more expensive GPU instances may not be the most performant for all DNN models and that AWS can sometimes sub-optimally allocate hardware interconnect resources. Specifically, the intra-machine interconnect can introduce communication overheads of up to 90% of DNN training time and the network-connected instances can suffer from up to 5x slowdown compared to training on a single instance. Furthermore, (iii) we also model the impact of DNN m
While increasingly more applications are tempted to manage their data in decentralized systems, such as blockchains or distributed ledgers, the data exchange across multiple, potentially heterogeneous, decentralized s...
详细信息
ISBN:
(纸本)9798350307924
While increasingly more applications are tempted to manage their data in decentralized systems, such as blockchains or distributed ledgers, the data exchange across multiple, potentially heterogeneous, decentralized systems remains an open problem: State-of-the-art protocols cannot meet one or more of the core requirements, such as atomicity, liveness, and scalability. Specifically, in the field of scientific computing, although a blockchain service was recently developed for scientific computing environments, the data exchanges and transactions among distinct ledgers are not supported. Observing that many modern scientific applications are collaborated on by multiple teams and the increasingly complicated (in-situ) workflows thereof, we argue that there is a pressing need to realize an efficient and scalable protocol for distinct ledgers to exchange data in scientific computing. This paper proposes a topological approach to enabling atomic, nonblocking, and scalable data exchanges among an arbitrary number of scientific ledgers in the context of collaborative scientific computing. Specifically, we construct a topological space formed by these ledgers-abstracting those nodes in a cross-ledger transaction as topological objects such as abstract simplex and simplicial complex. These topological objects, in turn, serve as the building blocks of a topological protocol, namely TopoCommit, under practical assumptions. We implement TopoCommit and integrate it into SciChain, a recently published distributed ledger for tracking scientific data provenance. The extensive evaluation of up to 1,008 nodes and 144 distinct ledgers on CloudLab shows that TopoCommit outperforms state-of-the-art protocols by up to 70x.
This paper proposes a decision-making approach for the control of distribution systems with distributed energy resources (DERs) equipped with photovoltaic (PV) units and battery energy storage systems (BESS). The obje...
详细信息
ISBN:
(纸本)9798350318562;9798350318555
This paper proposes a decision-making approach for the control of distribution systems with distributed energy resources (DERs) equipped with photovoltaic (PV) units and battery energy storage systems (BESS). The objective is to minimize the total operational cost of the distribution system while satisfying the system operating constraints. The method is based on the discrete-time finite-horizon Markov Decision Process (MDP) framework. Different aspects of the distribution system operation are considered, such as the possibilities of curtailment of PV generation, managing battery storage, reactive power injection, load shedding, and providing a flexibility service for the transmission system. The model is tested for the ieee 33-bus system with two added DERs and the study cases involve various unexpected events. The experimental results show that this method enables the attainment of relatively low total cost values compared to the reference deterministic approach. The benefits of applying this approach are particularly evident when there is a significant difference between the predicted and actual PV power generation.
In modern industry, adaptation to market changes, as well as prompt reaction to a variety of predictable and unpredictable events, is a key requirement. Ubiquitous computing, real-time analytics, reconfigurable hardwa...
详细信息
ISBN:
(纸本)9798350322811
In modern industry, adaptation to market changes, as well as prompt reaction to a variety of predictable and unpredictable events, is a key requirement. Ubiquitous computing, real-time analytics, reconfigurable hardware/software components, often coexist in the complex, internally variegated, and often proprietary systems that are traditionally deployed to meet such requirement. However, such tailor-made systems meet only in part the requirements of openness, security, monitorability, geographical distribution, and most of all, remote extendability and changeability, which are crucial for prompt reaction to unforeseen circumstances. In this work, a containerized service application named Network Factory is presented. It enables the remote construction, configuration and operation of resilient computation systems that meet the above-mentioned requirements, and distinguish for their logical simplicity and for the uniform addressing of elaborations and human-computer interfaces, which are achieved through few reconfigurable components and communication mechanisms that are used from the production line up to the Cloud. Source code, documentation, and step-by-step introductory guides are publicly available in a dedicated GitHub repository, and distributed under the CC-BY-4.0 license.
In pervasive computing environments, learning the causal network of relationships between environmental variables is crucial to support situation recognition and planning. However, this may be impossible when computin...
详细信息
ISBN:
(纸本)9798350304367;9798350304374
In pervasive computing environments, learning the causal network of relationships between environmental variables is crucial to support situation recognition and planning. However, this may be impossible when computing nodes have only partial observability and control, as in the case of multiple fog nodes each sensing only a local portion of variables and controlling a local portion of actuators. In fact, the causal network learnt by individual nodes may not suffice to give them full awareness and control over the environment. In this paper, we propose a protocol for distributed causal discovery, where fog nodes in an environment cooperate with each other to expand their individual local knowledge of the causal network and acquire the minimal knowledge of the causal network needed to achieve awareness and control. We evaluate our approach in a smart home scenario, showing its superior performance with respect to global causal learning with full observability.
To promote the adoption of the edge paradigm, our community needs innovative approaches for geo-distributing cloud applications across multiple locations without modifying existing business logic. While recent efforts...
详细信息
ISBN:
(纸本)9798350361360;9798350361353
To promote the adoption of the edge paradigm, our community needs innovative approaches for geo-distributing cloud applications across multiple locations without modifying existing business logic. While recent efforts propose using external services to orchestrate REST operations and achieve geo-distribution, relying solely on resource sharing and replication has limitations in finely distributing manipulated resources. This paper introduces a novel collaboration method that extends resources across multiple instances, going beyond simple replication. Our approach employs a shard-like strategy, enabling the creation of a distributed resource with a unified state view while mitigating a synchronization overhead. The effectiveness of our mechanism is demonstrated through a proof-of-concept implemented on top of the Kubernetes ecosystem.
With high scalability and flexibility, serverless computing is becoming the most promising computing model. Existing serverless computing platforms initiate a container for each function invocation, which leads to a h...
详细信息
ISBN:
(纸本)9798350339864
With high scalability and flexibility, serverless computing is becoming the most promising computing model. Existing serverless computing platforms initiate a container for each function invocation, which leads to a huge waste of computing resources. Our examinations reveal that (i) executing invocations concurrently within a single container can provide comparable performance to that provided by multiple containers (i.e., traditional approaches);(ii) redundant resources generated within a container result in memory resource waste, which prolongs the execution time of function invocations. Motivated by these insightful observations, we propose FaaSBatch - a serverless framework that reduces invocation latency and saves scarce computing resources. In particular, FaaSBatch first classifies concurrent function requests into different function groups according to the invocation information. Next, FaaSBatch batches the invocations of each group, aiming to minimize resource utilization. Then, FaaSBatch utilizes an inline parallel policy to map each group of batched invocations into a single container. Finally, FaaSBatch expands and executes invocations of containers in parallel. To further reduce invocation latency and resource utilization, within each container, FaaSBatch reuses redundant resources created during function execution. We conduct extensive experiments based on Azure traces to evaluate the effectiveness and performance of FaaSBatch. We compare FaaSBatch with three state-of-the-art schedulers Vanilla, SFS, and Kraken. Our experimental results show that FaaSBatch effectively and remarkably slashes invocation latency and resource overhead. For instance, when executing I/O functions, FaaSBatch cuts back the invocation latency of Vanilla, SFS, and Kraken by up to 92.18%, 89.54%, and 90.65%, respectively;FaaSBatch also slashes the resource overhead of Vanilla, SFS, and Kraken by 58.89% to 94.77%, 43.72% to 90.39%, and 42.99% to 78.88%, respectively.
Blinky Blocks are cubic modular robots, which communicate with their neighbors through their faces, and change color using LEDs. We previously used sets of Blinky Blocks to display images on the basis of one Blinky Bl...
详细信息
Managing digital and connected systems has become increasingly challenging in the past decade due to their scale and complexity. A new perspective is required to manage these systems, considering the infrastructure an...
详细信息
ISBN:
(纸本)9798350322392
Managing digital and connected systems has become increasingly challenging in the past decade due to their scale and complexity. A new perspective is required to manage these systems, considering the infrastructure and components from edge to cloud, i.e., in the distributedcomputing continuum. Serverless computing offers improved scalability and cost efficiency, but balancing and coordinating serverless systems remain complex. Intent-based systems, popular in networking, can provide a solution by translating stakeholder inputs into actions that meet Service Level Objectives (SLOs). Their application in the computing continuum can be highly beneficial, but it has yet to be deeply explored. To bridge this gap, we propose a methodology for deploying an intent-based system for the computing continuum. We implement an architectural framework leveraging the serverless paradigm. Furthermore, we focus on defining and implementing the main components for translating the management requirements into actions executed by serverless functions inspired by a threelayer model. Through a Proof of Concept (PoC) deployed in Amazon's AWS cloud and detailed simulations, we showcase how such an approach can resolve conflicts in a complex system, i.e., balancing efficiency and availability. Our work aims to contribute to effectively managing the computing continuum and highlight the potential of intent-based systems in this domain. The experiments' results show our framework's ability to make appropriate scaling decisions, fulfilling both objectives.
暂无评论