Recent years have witnessed increasing interest in machine learning (ML) inferences on serverless computing due to its auto-scaling and cost-effective properties. However, one critical aspect, function granularity, ha...
详细信息
Recent years have witnessed increasing interest in machine learning (ML) inferences on serverless computing due to its auto-scaling and cost-effective properties. However, one critical aspect, function granularity, has been largely overlooked, limiting the potential of serverless ML. This paper explores the impact of function granularity on serverless ML, revealing its important effects on the SLO hit rates and resource costs of serverless applications. It further proposes adaptive granularity as an approach to addressing the phenomenon that no single granularity fits all applications and situations. It explores three predictive models and presents programming tools and runtime extensions to facilitate the integration of adaptive granularity into existing serverless platforms. Experiments show adaptive granularity produces up to a 29.2% improvement in SLO hit rates and up to a 24.6% reduction in resource costs over the state-of-the-art serverless ML which uses fixed granularity.
This paper examines a continuous-time routing system with general interarrival and service time distributions, operating under either the join-the-shortest-queue policy or the power-of-two-choices policy. Under a weak...
详细信息
This paper examines a continuous-time routing system with general interarrival and service time distributions, operating under either the join-the-shortest-queue policy or the power-of-two-choices policy. Under a weaker set of assumptions than those commonly established in the literature, we prove that the scaled steady-state queue length at each station converges weakly to a common exponential random variable in heavy traffic. Specifically, our results hold under the assumption of the (2 + epsilon)th moment for the interarrival and service distributions with some epsilon > 0. The proof leverages the Palm version of the basic adjoint relationship (BAR) as a key technique.
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computingsystems and content delivery networks. The constraints are represented by a bipartite gra...
详细信息
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computingsystems and content delivery networks. The constraints are represented by a bipartite graph or network that interconnects dispatchers with compatible servers. Each dispatcher receives tasks over time and sends every task to a compatible server with the least number of tasks, or to a server with the least number of tasks among d compatible servers selected uniformly at random. We focus on networks where the neighborhood of at least one server is skewed in a limiting regime. This means that a diverging number of dispatchers are in the neighborhood which are each compatible with a uniformly bounded number of servers;thus, the degree of the central server approaches infinity while the degrees of many neighboring dispatchers remain bounded. We prove that each server with a skewed neighborhood saturates, in the sense that the mean number of tasks queueing in front of it in steady state approaches infinity. Paradoxically, this pathological behavior can even arise in random networks where nearly all the servers have at most one task in the limit.
This work highlights the significance of I/O bottlenecks that data-intensive HPC workflows face in serverless environments - an issue that has been largely overlooked by prior works. To address this challenge, we prop...
详细信息
This work highlights the significance of I/O bottlenecks that data-intensive HPC workflows face in serverless environments - an issue that has been largely overlooked by prior works. To address this challenge, we propose a novel framework, StarShip, which effectively addresses I/O bottlenecks for HPC workflows executing in serverless environments by leveraging different storage options and multi-tier functions, co-optimizing for service time and service cost. StarShip exploits the Levenberg-Marquardt optimization method to find an effective solution in a large, complex search space. StarShip achieves significantly better performance and cost compared to competing techniques, improving service time by 45% and service cost by 37.6% on average over state-of-the-art solutions.
Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from operating systems and applications to containers. Containers ...
详细信息
Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from operating systems and applications to containers. Containers are lightweight virtualization technologies used to package code and dependencies, providing portable, reproducible and isolated environments. For their ease of use, data scientists often utilize machine learning containers to simplify their workflow. However, this convenience comes at a cost: containers are often bloated with unnecessary code and dependencies, resulting in very large sizes. In this paper, we analyze and quantify bloat in machine learning containers. We develop MMLB, a framework for analyzing bloat in software systems, focusing on machine learning containers. MMLB measures the amount of bloat at both the container and package levels, quantifying the sources of bloat. In addition, MMLB integrates with vulnerability analysis tools and performs package dependency analysis to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from TensorFlow, PyTorch, and Nvidia, we show that bloat accounts for up to 80% of machine learning container sizes, increasing container provisioning times by up to 370% and exacerbating vulnerabilities by up to 99%.
Today's distributed machine learning (DML) introduces heavy traffic load, making the interconnection network one of the primary bottlenecks. To mitigate this bottleneck, existing state-of-the-art network optimizat...
详细信息
Today's distributed machine learning (DML) introduces heavy traffic load, making the interconnection network one of the primary bottlenecks. To mitigate this bottleneck, existing state-of-the-art network optimization methods, such as traffic or topology engineering, are proposed to adapt to real-time traffic. However, current traffic measurement and prediction methods struggle to collect sufficiently fine-grained and accurate traffic patterns. This limitation impedes the ability of cutting-edge network optimization techniques to react agilely to the ever-changing traffic demands of DML jobs. This paper proposes NetJIT, a novel program-behavior-aware toolkit for accurately foreseeing the traffic pattern of DML. To the best of our knowledge, this is the first work proposing the use of just-in-time (JIT) program analysis for real-time traffic measurement. In DML applications, communication behavior is primarily determined by the previously computed results. NetJIT leverages this characteristic to anticipate communication details by tracing and analyzing the data relations in the computation process. This capability enables the deployment of optimization strategies in advance. We deploy NetJIT in real-world network optimization for traffic preknowledge. Evaluation with the self-built testbed prototype demonstrates that NetJIT can achieve up to about 97% less error of detecting communication events compared with other methods. Simulations with real-world DML workloads further illustrate that NetJIT enables more precise network optimization, leading to approximately 50% better network performance w.r.t the metrics including average iteration time, throughput, and average packet delay.
In blockchains using the Proof-of-Work (PoW) consensus mechanism, a mining pool is a joint group of miners who combine their computational resources and share the generated revenue. Similarly, when the Proof-of-Stake ...
详细信息
In blockchains using the Proof-of-Work (PoW) consensus mechanism, a mining pool is a joint group of miners who combine their computational resources and share the generated revenue. Similarly, when the Proof-of-Stake (PoS) consensus mechanism is adopted, the staking pool imitates the design of the mining pool by aggregating the stakes. However, in PoW blockchains, the pooling approach has been criticized to be vulnerable to the block withholding (BWH) attack. BWH attackers may steal the dividends from victims by pretending to work but making invalid contributions to the victim pools. It is well known that BWH attackers against PoW face the miner's dilemma. To our knowledge, despite the popularity of PoS, we are the first to study the pool BWH attack against PoS. Interestingly, we find that, for a network only consisting of one attacker pool and one victim pool, the attacker will eventually manipulate the network while the victim will vanish by losing the stake ratio gradually. Moreover, in a more realistic scenario with multiple BWH attacker pools and one solo staker who does not join any pools, we show that only one lucky attacker and the solo staker will survive, whereas all the other pools will vanish gradually, revealing the staker's dilemma. These findings indicate that, compared to PoW, the BWH attack on PoS has a much more severe impact due to the attacker's resource aggregation advantage. Our analysis is supported by experiments on massive real blockchain systems and numerical simulations.
Confidential computing is gaining traction in the cloud, driven by the increasing security and privacy concerns across various industries. Recent trusted hardware advancements introduce Confidential Virtual Machines (...
详细信息
Confidential computing is gaining traction in the cloud, driven by the increasing security and privacy concerns across various industries. Recent trusted hardware advancements introduce Confidential Virtual Machines (CVMs) to alleviate the programmability and usability challenges of the previously proposed enclave-based trusted computing technologies. CVM hardware extensions facilitate secure, hardware-isolated encrypted VMs, promoting programmability and easier deployment in cloud infrastructures. However, differing microarchitectural features, interfaces, and security properties among hardware vendors complicate the evaluation of CVMs for different use cases. Understanding the performance implications, functional limitations, and security guarantees of CVMs is a crucial step toward their adoption. This paper presents a detailed empirical analysis of two leading CVM technologies: AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP) and Intel Trust Domain Extensions (TDX). We review their microarchitectural components and conduct a thorough performance evaluation across various aspects, including memory management, computational performance, storage and network stacks, and attestation primitives. We further present a security analysis through a trusted computing base (TCB) evaluation and Common Vulnerabilities and Exposures (CVE) analysis. Our key findings demonstrate, among others, the effect of CVMs on boot time, memory management and I/O, and identify inefficiencies in their context switch mechanisms. We further provide insights into the performance implications of CVMs and highlight potential room for improvement.
We investigate the problem of stabilizing an unknown networked linear system under communication constraints and adversarial disturbances. We propose the first provably stabilizing algorithm for the problem. The algor...
详细信息
We investigate the problem of stabilizing an unknown networked linear system under communication constraints and adversarial disturbances. We propose the first provably stabilizing algorithm for the problem. The algorithm uses a distributed version of nested convex body chasing to maintain a consistent estimate of the network dynamics and applies system level synthesis to determine a distributed controller based on this estimated model. Our approach avoids the need for system identification and accommodates a broad class of communication delay while being fully distributed and scaling favorably with the number of subsystems.
暂无评论