The proceedings contain 50 papers. The topics discussed include: semantic privacy-preserving for video surveillance services on the edge;distributed tracking and verifying: a real-time and high-accuracy visual trackin...
ISBN:
(纸本)9798400701238
The proceedings contain 50 papers. The topics discussed include: semantic privacy-preserving for video surveillance services on the edge;distributed tracking and verifying: a real-time and high-accuracy visual tracking edge computing framework for Internet of Things;Octopus: in-network content adaptation to control congestion on 5G links;experimental test-bed for computation offloading for cooperative inference on edge devices;on balancing latency and quality of edge-native multi-view 3D reconstruction;RAVAS: interference-aware model selection and resource allocation for live edge video analytics;democratizing drone autonomy via edge computing;energy time fairness: balancing fair allocation of energy and time for GPU workloads;unveiling energy efficiency in deep learning: measurement, prediction, and scoring across edge devices;and bang for the buck: evaluating the cost-effectiveness of heterogeneous edge platforms for neural network workloads.
In this paper, we study the minimum dominating set (MDS) problem and the minimum total dominating set (MTDS) problem. We propose a new idea to compute approximate MDS and MTDS. This new approach can be implemented in ...
详细信息
This paper proposes an algorithm-specific instruction (ASI)-based fast Fourier transform (FFT) code generation framework, named FFTASI, to generate unified architecture independent butterfly kernels that can be transf...
详细信息
Deep Learning (DL) has seen rapid adoption in all domains. Since training DL models is expensive, both in terms of time and resources, application workflows that make use of DL increasingly need to operate with a larg...
详细信息
ISBN:
(纸本)9798400704130
Deep Learning (DL) has seen rapid adoption in all domains. Since training DL models is expensive, both in terms of time and resources, application workflows that make use of DL increasingly need to operate with a large number of derived learning models, which are obtained through transfer learning and fine-tuning. At scale, thousands of such derived DL models are accessed concurrently by a large number of processes. In this context, an important question is how to design and develop specialized DL model repositories that remain scalable under concurrent access, while addressing key challenges: how to query the DL model architectures for specific patterns? How to load/store a subset of layers/tensors from a DL model? How to efficiently share unmodified layers/tensors between DL models derived from each other through transfer learning? How to maintain provenance and answer ancestry queries? State of art leaves a gap regarding these challenges. To fill this gap, we introduce EvoStore, a distributed DL model repository with scalable data and metadata support to store and access derived DL models efficiently. Large-scale experiments on hundreds of GPUs show significant benefits over state-of-art with respect to I/O and metadata performance, as well as storage space utilization.
Fair resource allocation is one of the most important topics in communication networks. Existing solutions almost exclusively assume each user utility function is known and concave. This paper seeks to answer the foll...
详细信息
ISBN:
(纸本)9781450399265
Fair resource allocation is one of the most important topics in communication networks. Existing solutions almost exclusively assume each user utility function is known and concave. This paper seeks to answer the following question: how to allocate resources when utility functions are unknown, even to the users? This answer has become increasingly important in the next-generation AI-aware communication networks where the user utilities are complex and their closed-forms are hard to obtain. In this paper, we provide a new solution using a distributed and data-driven bilevel optimization approach, where the lower level is a distributed network utility maximization (NUM) algorithm with concave surrogate utility functions, and the upper level is a data-driven learning algorithm to find the best surrogate utility functions that maximize the sum of true network utility. The proposed algorithm learns from data samples (utility values or gradient values) to autotune the surrogate utility functions to maximize the true network utility, so works for unknown utility functions. For the general network, we establish the nonasymptotic convergence rate of the proposed algorithm with nonconcave utility functions. The simulations validate our theoretical results and demonstrate the great effectiveness of the proposed method in a real-world network.
Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to ac...
详细信息
ISBN:
(纸本)9798400704352
Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of AlphaFold model are challenging due to its high computation and memory cost. In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference. We propose Dynamic Axial Parallelism (DAP) as a novel model parallelism method. Additionally, we have implemented a series of low-level optimizations aimed at reducing communication, computation, and memory costs. These optimizations include Duality Async Operations, highly optimized kernels, and AutoChunk (an automated search algorithm finds the best chunk strategy to reduce memory peaks). Experimental results show that FastFold can efficiently scale to more GPUs using DAP and reduces overall training time from 11 days to 67 hours and achieves 7.5 similar to 9.5x speedup for long-sequence inference. Furthermore, AutoChunk can reduce memory cost by over 80% during inference by automatically partitioning the intermediate tensors during the computation.
Large deep learning models have recently garnered substantial attention from both academia and industry. Nonetheless, frequent failures are observed during large model training due to large-scale resources involved an...
详细信息
ISBN:
(纸本)9798400702297
Large deep learning models have recently garnered substantial attention from both academia and industry. Nonetheless, frequent failures are observed during large model training due to large-scale resources involved and extended training time. Existing solutions have significant failure recovery costs due to the severe restriction imposed by the bandwidth of remote storage in which they store checkpoints. This paper presents Gemini, a distributed training system that enables fast failure recovery for large model training by checkpointing to CPU memory of the host machines with much larger aggregated bandwidth. However, two challenges prevent naively checkpointing to CPU memory. First, the availability of checkpoints in CPU memory cannot be guaranteed when failures occur. Second, since the communication traffic for training and checkpointing share the same network, checkpoint traffic can interfere with training traffic and harm training throughput. To address these two challenges, this paper proposes: 1) a provably near-optimal checkpoint placement strategy to maximize the probability of failure recovery from checkpoints in CPU memory;and 2) a checkpoint traffic scheduling algorithm to minimize, if not eliminate, the interference of checkpoint traffic on model training. Our evaluation shows that overall Gemini achieves a faster failure recovery by more than 13x than existing solutions. Moreover, it achieves optimal checkpoint frequency, i.e., every iteration, and incurs no overhead on training throughput for large model training.
Leveraging serverless computing for cloud-based machine learning services is on the rise, promising cost-efficiency and flexibility are crucial for ML applications relying on high-performance GPUs and substantial memo...
详细信息
Computer science (CS) and information technology (IT) curricula are grounded in theoretical and technical skills. Topics like equity and inclusive design are rarely found in mainstream student studies. This results in...
详细信息
ISBN:
(纸本)9798400704239
Computer science (CS) and information technology (IT) curricula are grounded in theoretical and technical skills. Topics like equity and inclusive design are rarely found in mainstream student studies. This results in graduates with outdated practices and limitations in software development. A research project was conducted to educate the faculty to integrate inclusive software design into the CS undergraduate curriculum. The objective is to produce graduates with the ability to develop inclusive software. This experience report presents the results of teaching inclusive design throughout the four-year CS and IT curriculum, focusing on the impact on faculty. This easy-to-adopt, high-impact approach improved student retention and classroom climate, broadening participation. Research questions address faculty understanding of inclusive software design, the approach's feasibility, improvement in students' ability to design equitable software, and assessment of the inclusiveness culture for students in computing programs. Faculty attended a summer workshop to learn about inclusive design and update their teaching materials to include the GenderMag method. Beginning in CS0 and CS1 and continuing through Senior Capstone, faculty used updated course assignments to include inclusive design in 10 courses for 44 sections taught. Faculty outcomes are positive, with the planning to include inclusive design and working with other department faculty most engaging. Faculty were impressed by student ownership and adoption of inclusive design methods, particularly in the culminating capstone senior project.
How can we let users adapt video-based meetings as easily as they rearrange furniture in a physical meeting room? We describe a design space for video conferencing systems that includes a fve-step "ladder of tail...
详细信息
ISBN:
(纸本)9798400701320
How can we let users adapt video-based meetings as easily as they rearrange furniture in a physical meeting room? We describe a design space for video conferencing systems that includes a fve-step "ladder of tailorability," from minor adjustments to live reprogramming of the interface. We then present Mirrorverse and show how it applies the principles of computational media to support live tailoring of video conferencing interfaces to accommodate highly diverse meeting situations. We present multiple use scenarios, including a virtual workshop, an online yoga class, and a stand-up team meeting to evaluate the approach and demonstrate its potential for new, remote meetings with fuid transitions across activities.
暂无评论