检索结果-内蒙古大学图书馆

HIDL: High-Throughput Deep learning Inference at the Hybrid Mobile Edge

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED systems 2022年第12期33卷 4499-4514页

作者： Wu, Jing Wang, Lin Pei, Qiangyu Cui, Xingqi Liu, Fangming Yang, Tingting Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Cluster & Grid Comp LabSch Comp Sci & Technol Wuhan 430074 Peoples R China Vrije Univ Amsterdam NL-1081 HV Amsterdam Netherlands Tech Univ Darmstadt D-64289 Darmstadt Germany Peng Cheng Lab Shenzhen 518066 Peoples R China

Deep neural networks (DNNs) have become a critical component for inference in modem mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile- and server-based approaches compromise either the inference accuracy or latency. Instead, a hybrid approach can reap the benefits of the two by splitting the DNN at an appropriate layer and running the two parts separately on the mobile and the server respectively. Nevertheless, the DNN throughput in the hybrid approach has not been carefully examined, which is particularly important for edge servers where limited compute resources are shared among multiple DNNs. This article presents HiTDL, a runtime framework for managing multiple DNNs provisioned following the hybrid approach at the edge. HiTDL's mission is to improve edge resource efficiency by optimizing the combined throughput of all co-located DNNs, while still guaranteeing their SLAB. To this end, HiTDL first builds comprehensive performance models for DNN inference latency and throughout with respect to multiple factors including resource availability, DNN partition plan, and cross-DNN interference. HiTDL then uses these models to generate a set of candidate partition plans with SLA guarantees for each DNN. Finally, HiTDL makes global throughput-optimal resource allocation decisions by selecting partition plans from the candidate set for each DNN via solving a fairness-aware multiple-choice knapsack problem. Experimental results based on a prototype implementation show that HiTDL improves the overall throughput of the edge by 4.3x compared with the state-of-the-art.

关键词： Deep learning inference edge computing resource allocation systems for machine learning

来源：评论

学校读者我要写书评

暂无评论

Seraph: A Performance-Cost Aware Tuner for Training Reinforcement learning Model on Serverless Computing 24

Seraph: A Performance-Cost Aware Tuner for Training Reinforc...

引用

15th Asia-Pacific Workshop on systems (APSys)

作者： Han, Jinbo Wei, Xingda Chen, Rong Chen, Haibo Shanghai Jiao Tong Univ Inst Parallel & Distributed Syst SEIEE Shanghai Peoples R China

ISBN: (纸本)9798400711053

Training a reinforcement learning model is critical for various AI tasks. However, determining the hardware resources required for training RL models is challenging due to the interaction between the CPU and GPU, and the variability that exists in the training. The problem becomes more challenging when deploying an RL training job on the cloud with serverless computing, as we should consider both the performance and cost of training RL models. Existing tuners, like Ray Tune, require users to provide a search space. It is both error-prone and unable to search the setup with the desired cost. We present Seraph, the first tuner for RL training that finds the hardware configuration with the best performance within a user-given cost boundary. Seraph explicitly models the performance of training by decomposing RL training and using a stochastic model to harness variability. Compared to Ray Tune, it finds the optimal with 71% tuning time reduction.

关键词： systems for machine learning Reinforcement learning Server-less Computing

来源：评论

学校读者我要写书评

暂无评论

Hidet: Task-Mapping Programming Paradigm for Deep learning Tensor Programs 2023

Hidet: Task-Mapping Programming Paradigm for Deep Learning T...

引用

28th ACM International Conference on Architectural Support for Programming Languages and Operating systems (ASPLOS)

作者： Ding, Yaoyao Yu, Cody Hao Zheng, Bojian Liu, Yizhi Wang, Yida Pekhimenko, Gennady Univ Toronto Toronto ON Canada Amazon Web Serv Santa Clara CA USA Vector Inst Toronto ON Canada

ISBN: (纸本)9781450399166

As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide efficient model serving. However, it is challenging to develop efficient tensor programs for deep learning operators due to the high complexity of modern accelerators (e.g., NVIDIA GPUs and Google TPUs) and the rapidly growing number of operators. Deep learning compilers, such as Apache TVM, adopt declarative scheduling primitives to lower the bar of developing tensor programs. However, we show that this approach is insufficient to cover state-of-the-art tensor program optimizations (e.g., double buffering). In this paper, we propose to embed the scheduling process into tensor programs and use dedicated mappings, called task mappings, to define the computation assignment and ordering directly in the tensor programs. This new approach greatly enriches the expressible optimizations by allowing developers to manipulate tensor programs at amuch finer granularity (e.g., allowing program-statement-level optimizations). We call the proposed method the task-mapping programming paradigm. In addition, we propose a new post-scheduling fusion optimization that allows developers to focus on scheduling every single operator and automates the fusion after scheduling. It greatly reduces the engineering efforts for operator fusion. Our proposed paradigm also constructs an efficient hardware-centric schedule space, which is agnostic to the program input size and greatly reduces the tuning time. With the proposed paradigm, we implement a deep learning compiler - Hidet. Extensive experiments on modern convolution and transformer models show that Hidet outperforms state-of-the-art DNN inference framework, ONNX Runtime, and compiler, TVM equippedwith scheduler AutoTVMand Ansor, by up to 1.48x (1.22x on average). It also reduces the tuning time by 20x and 11x compared with AutoTVM and Ansor, respectively. We open-sou

关键词： deep learning systems systems for machine learning programming models compilation tensor computation

来源：评论

学校读者我要写书评

暂无评论

Towards a Robust Knowledge Graph-Enabled machine learning Service Description Framework 15

Towards a Robust Knowledge Graph-Enabled Machine Learning Se...

引用

15th IEEE International Conference on Semantic Computing (ICSC)

作者： Menik, Samiyuru Ramaswamy, Lakshmish Univ Georgia Dept Comp Sci Athens GA 30602 USA

ISBN: (纸本)9781728188997

Although machine learning (ML) is widely expected to become a key enabler of innovative applications in a number of important domains, building, deploying and managing robust ML pipelines for diverse domains are very challenging as they require expertise in both the application domain as well as the ML field. Recently, machine learning as a service (MLAAS) is being explored as a paradigm to address these challenges and to democratize artificial intelligence (AI). MLAAS envisions an ecosystem with powerful mechanisms for publishing, searching/discovering, composing and deploying ML models. This paper argues that a semantic-rich and flexible ML service description is indispensable for realizing such an ecosystem. Towards this end, we outline a unique approach that leverages knowledge graphs (KGs) for ML service description. A requirements study highlights the ML services aspects that need to be provisioned in the description framework. This paper presents a novel five-dimensional KG-enabled ML service description framework, which incorporates ML task description, Input-Output (I-O) description, ML-model description, dataset and training description and performance characteristics description. In designing this ML service description framework, we introduce several conceptual structures such as functional specifications with semantically-extended types and compound knowledge graphs for representing ML model architectures.

关键词： machine learning machine learning Applications machine learning as a Service machine learning Description MLaaS Semantic Knowledge Graph systems for machine learning

来源：评论

学校读者我要写书评

暂无评论

Kraken: Memory-Efficient Continual learning for Large-Scale Real-Time Recommendations

Kraken: Memory-Efficient Continual Learning for Large-Scale ...

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Xie, Minhui Ren, Kai Lu, Youyou Yang, Guangxu Xu, Qingxing Wu, Bihai Lin, Jiazhen Ao, Hongbo Xu, Wanhong Shu, Jiwu Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Kuaishou Technol Beijing Peoples R China

ISBN: (纸本)9781728199986

Modern recommendation systems in industry often use deep learning (DL) models that achieve better model accuracy with more data and model parameters. However, current open-source DL frameworks, such as TensorFiow and PyTorch, show relatively low scalability on training recommendation models with terabytes of parameters. To efficiently learn large-scale recommendation models from data streams that generate hundreds of terabytes training data daily, we introduce a continual learning system called Kraken. Kraken contains a special parameter server implementation that dynamically adapts to the rapidly changing set of sparse features for the continual training and serving of recommendation models. Kraken provides a sparsity-aware training system that uses different learning optimizers for dense and sparse parameters to reduce memory overhead. Extensive experiments using real-world datasels confirm the effectiveness and scalability of Kraken. Kraken can benefit the accuracy of recommendation tasks with the same memory resources, or trisect the memory usage while keeping model performance.

关键词： systems for machine learning Continual learning Recommendation System

来源：评论

学校读者我要写书评

暂无评论

Nautilus: An Optimized System for Deep Transfer learning over Evolving Training Datasets 22

Nautilus: An Optimized System for Deep Transfer Learning ove...

引用

International Conference on Management of Data (SIGMOD)

作者： Nakandala, Supun Kumar, Arun Univ Calif San Diego La Jolla CA 92093 USA

ISBN: (纸本)9781450392495

Deep learning (DL) has revolutionized unstructured data analytics. But in most cases, DL needs massive labeled datasets and large compute clusters, which hinders its adoption. These limitations can be overcome using a popular paradigm called deep transfer learning (DTL). With DTL, one adapts a pre-trained DL model instead of training a model from scratch. Thus, DTL reduces the massive training data and compute requirements to train a model. During adaptation, a common practice is to freeze most pre-trained model parts and adapt only the remaining. Since no single adaptation scheme is universally the best, one often evaluates several schemes, which is also called model selection. We also observed that data labeling for DTL is seldom a one-off process. One often updates their labeled data intermittently by adding new labeled data and performs model selection to evaluate the accuracy of the trained models. Today, one executes this workload by performing computations for the entire pre-trained model and repeats it for every model selection cycle. This approach results in redundant computations in frozen model parts and causes usability and system inefficiency issues. In this work, we reimagine DTL model selection in the presence of frozen layers as an instance of multi-query optimization and propose two optimizations that reduce redundant computations and training overheads. We implement our optimizations into a data system called NAUTILUS. Experiments with end-to-end workloads on benchmark datasets show that NAUTILUS reduces DTL model selection runtimes by up to 5X compared to the current practice.

关键词： systems for machine learning Deep Transfer learning Multi-Query Optimization

来源：评论

学校读者我要写书评

暂无评论

Varuna: Scalable, Low-cost Training of Massive Deep learning Models 22

Varuna: Scalable, Low-cost Training of Massive Deep Learning...

引用

17th European Conference on Computer systems (EuroSys)

作者： Athlur, Sanjith Saran, Nitika Sivathanu, Muthian Ramjee, Ramachandran Kwatra, Nipun Carnegie Mellon Univ Pittsburgh PA 15213 USA Cornell Univ Ithaca NY USA Microsoft Res Bangalore Karnataka India

ISBN: (纸本)9781450391627

systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyperclusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyperclusters and custom high-speed inter-connects limits the size of such clusters, creating (a) scalability limits on job parallelism;(b) resource fragmentation across hyperclusters. In this paper, we present Varuna a new system that enables training massive deep learning models on commodity networking. Varuna makes thrifty use of networking resources and automatically configures the user's training job to efficiently use any given set of resources. Therefore, Varuna is able to leverage "low-priority" VMs that cost about 5x cheaper than dedicated GPUs, thus significantly reducing the cost of training massive models. We demonstrate the efficacy of Varuna by training massive models, including a 200 billion parameter model, on 5x cheaper "spot VMs", while maintaining high training throughput. Varuna improves end-to-end training time for language models like BERT and GPT-2 by up to 18x compared to other model-parallel approaches and up to 26% compared to other pipeline parallel approaches on commodity VMs. The code for Varuna is available at https://***/microsoft/varuna.

关键词： Distributed systems systems for machine learning Large Scale DNN Training

来源：评论

学校读者我要写书评

暂无评论

SOL: Safe On-Node learning in Cloud Platforms 27

SOL: Safe On-Node Learning in Cloud Platforms

引用

27th ACM International Conference on Architectural Support for Programming Languages and Operating systems (ASPLOS)

作者： Wang, Yawen Crankshaw, Daniel Yadwadkar, Neeraja J. Berger, Daniel Kozyrakis, Christos Bianchini, Ricardo Stanford Univ Stanford CA 94305 USA Microsoft Res Redmond WA USA Univ Texas Austin Austin TX USA

ISBN: (纸本)9781450392051

Cloud platforms run many software agents on each server node. These agents manage all aspects of node operation, and in some cases frequently collect data and make decisions. Unfortunately, their behavior is typically based on pre-defined static heuristics or offline analysis;they do not leverage on-node machine learning (ML). In this paper, we first characterize the spectrum of node agents in Azure, and identify the classes of agents that are most likely to benefit from on-node ML. We then propose SOL, an extensible framework for designing ML-based agents that are safe and robust to the range of failure conditions that occur in production. SOL provides a simple API to agent developers and manages the scheduling and running of the agent-specific functions they write. We illustrate the use of SOL by implementing three ML-based agents that manage CPU cores, node power, and memory placement. Our experiments show that (1) ML substantially improves our agents, and (2) SOL ensures that agents operate safely under a variety of failure conditions. We conclude that ML-based agents show significant potential and that SOL can help build them.

关键词： Cloud computing on-node agents machine learning for systems systems for machine learning

来源：评论

学校读者我要写书评

暂无评论

Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling 27

Layerweaver: Maximizing Resource Utilization of Neural Proce...

引用

27th IEEE International Symposium on High-Performance Computer Architecture (HPCA)

作者： Oh, Young H. Kim, Seonghak Jin, Yunho Son, Sam Bae, Jonghyun Lee, Jongsung Park, Yeonhong Kim, Dong Uk Ham, Tae Jun Lee, Jae W. Sungkyunkwan Univ Dept Elect & Comp Engn Suwon South Korea Seoul Natl Univ Neural Proc Res Ctr NPRC Dept Comp Sci & Engn Seoul South Korea

ISBN: (纸本)9781665422352

To meet surging demands for deep learning inference services, many cloud computing vendors employ high-performance specialized accelerators, called neural processing units (NPUs). One important challenge for effective use of NPUs is to achieve high resource utilization over a wide spectrum of deep neural network (DNN) models with diverse arithmetic intensities. There is often an intrinsic mismatch between the compute-to-memory bandwidth ratio of an NPU and the arithmetic intensity of the model it executes, leading to under-utilization of either compute resources or memory bandwidth. Ideally, we want to saturate both compute TOP/s and DRAM bandwidth to achieve high system throughput. Thus, we propose Layerweaver, an inference serving system with a novel multi-model time-multiplexing scheduler for NPUs. Layerweaver reduces the temporal waste of computation resources by interweaving layer execution of multiple different models with opposing characteristics: compute-intensive and memory-intensive. Layerweaver hides the memory time of a memory-intensive model by overlapping it with the relatively long computation time of a compute-intensive model, thereby minimizing the idle time of the computation units waiting for off-chip data transfers. For a two-model serving scenario of batch 1 with 16 different pairs of compute- and memory-intensive models, Layerweaver improves the temporal utilization of computation units and memory channels by 44.0% and 28.7%, respectively, to increase the system throughput by 60.1% on average, over the baseline executing one model at a time.

关键词： Layer-wise Scheduling systems for machine learning Inference Serving System Neural Networks Accelerator systems Multi-tasking

来源：评论

学校读者我要写书评

暂无评论

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale 19

Snorkel DryBell: A Case Study in Deploying Weak Supervision ...

引用

ACM SIGMOD International Conference on Management of Data (SIGMOD)

作者： Bach, Stephen H. Rodriguez, Daniel Liu, Yintao Luo, Chong Shao, Haidong Xia, Cassandra Sen, Souvik Ratner, Alex Hancock, Braden Alborzi, Houman Kuchhal, Rahul Re, Chris Malkin, Rob Brown Univ Providence RI 02912 USA Google Mountain View CA 94043 USA Stanford Univ Stanford CA 94305 USA

ISBN: (纸本)9781450356435

Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an average 52% performance improvement, and executes over millions of data points in tens of minutes.

关键词： systems for machine learning weak supervision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：