检索结果-内蒙古大学图书馆

Fast and fair split computing for accelerating deep neural network (DNN) inference

ICT EXPRESS 2025年第1期11卷 47-52页

作者： Cha, Dongju Lee, Jaewook Jung, Daeyoung Pack, Sangheon LG Elect Commun & Media Stand Lab Seoul South Korea Pukyong Natl Univ Dept Informat & Commun Engn Busan South Korea Korea Univ Sch Elect Engn Seoul South Korea

Conventional split computing approaches for AI models that generate large outputs suffer from long transmission and inference times. Due to the limited resources of the edge server and selfish MDs, some MDs cannot offload their tasks and sacrifice their performance. To address these issues, we formulate an optimization problem to determine one or two split points that minimize inference latency while ensuring fair offloading among MDs. Additionally, we devise a low-complexity heuristic algorithm called fast and fair split computing (F2SC). Evaluation results demonstrate that F2SC reduces inference time by 3.8% similar to 20.1% compared to the conventional approaches while maintaining fairness. (c) 2024 The Author(s). Published by Elsevier B.V. on behalf of The Korean Institute of Communications and Information Sciences. This is an open access article under the CC BY-NC-ND license (http://***/licenses/by-nc-nd/4.0/).

关键词： Split computing Split point decision deep neural network inference Jain's fairness index

来源：评论

学校读者我要写书评

暂无评论

Privacy-preserving and verifiable deep learning inference based on secret sharing

引用

NEUROCOMPUTING 2022年第0期483卷 221-234页

作者： Duan, Jia Zhou, Jiantao Li, Yuanman Huang, Caishi Univ Macau Fac Sci & Technol Dept Comp & Informat Sci State Key Lab Internet Things Smart City Taipa Macao Peoples R China Shenzhen Univ Coll Elect & Informat Engn Shenzhen Peoples R China

deep learning inference, providing the model utilization of deep learning, is usually deployed as a cloud-based framework for the resource-constrained client. However, the existing cloud-based frameworks suffer from severe information leakage or lead to significant increase of communication cost. In this work, we address the problem of privacy-preserving deep learning inference in a way that both the privacy of the input data and the model parameters can be protected with low communication and computational costs. Additionally, the user can verify the correctness of results with small overhead, which is very important for critical application. Specifically, by designing secure sub-protocols, we introduce a new layer to collaboratively perform the secure computations involved in the inference. With the cooperation of the secret sharing, we inject the verifiable data into the input, enabling us to check the correctness of the returned inference results. Theoretical analyses and extensive experimental results over MNIST and CIFAR10 datasets are provided to validate the superiority of our proposed privacypreserving and verifiable deep learning inference (PVDLI) framework. (c) 2022 Elsevier B.V. All rights reserved.

关键词： deep neural network inference deep learning prediction Secure multi-party computation Privacy-preserving Verifiable computation

来源：评论

学校读者我要写书评

暂无评论

neural Architecture Search for Computation Offloading of DNNs from Mobile Devices to the Edge Server 12

Neural Architecture Search for Computation Offloading of DNN...

引用

12th International Conference on ICT Convergence (ICTC) - Beyond the Pandemic Era with ICT Convergence Innovation

作者： Lee, KyungChae Le Vu Linh Kim, Heejae Youn, Chan-Hyun Korea Adv Inst Sci & Technol KAIST Sch Elect Engn Daejeon South Korea

ISBN: (纸本)9781665423830

With the rapid development of modern deep learning technology, deep neural network (DNN)-based mobile applications have also been considered for various areas. However, since mobile devices are not optimized to run the DNN applications due to their limit of computational resources, several computation offloading-based approaches have been introduced to overcome the issue;for DNN models, it was reported that, their elaborate partitioning, which allows that input samples are partially executed on mobile devices and then the edge server processes the rest of the execution, can be effective in improving runtime performance. In addition, to improve communication-efficiency in the offloading scenario, there have been also studies to reduce transmitted data from a mobile device and the edge server by leveraging model compression. However, the existing approaches have the root limitation that the performance eventually depend on that of the architecture of original DNN models. To overcome this, we propose a novel neural architecture search (NAS) method to consider the computation offloading cases. On the top of the existing NAS approaches, we additionally introduce resource and channel selection mask. The resource selection mask effectively divides the operations in the target model into those for a mobile device and the edge server;the channel selection mask allows to transmit only selected channels to the edge server without the reduction of task performance (e.g., accuracy). Based on the two additional masks, for the NAS procedure we introduce a new loss function to take into account end-to-end inference time as well as the task performance which is the original goal of NAS. In the evaluation, the proposed method is compared to existing approaches;we see from the experimental results that our method outperforms both the previous NAS and pruning-based model partitioning approaches.

关键词： computational offloading deep neural network inference edge computing neural architecture search

来源：评论

学校读者我要写书评

暂无评论

FPGA demonstrator of a Programmable Ultra-efficient Memristor-based Machine Learning inference Accelerator 4

FPGA demonstrator of a Programmable Ultra-efficient Memristo...

引用

4th IEEE International Conference on Rebooting Computing (ICRC)

作者： Foltin, Martin Warner, Craig Lee, Eddie Chalamalasetti, Sai Rahul Brueggen, Chris Williams, Charles Jansen, Nathaniel Saenz, Felipe Li, Luis Federico Aguiar, Glaucimar Antunes, Rodrigo Silveira, Plinio Knuppe, Gustavo Ambrosi, Joao Chatterjee, Soumitra Kolhe, Jitendra Onkar Lakshiminarashimha, Sunil Milojicic, Dejan Strachan, John Paul Sharma, Amit Hewlett Packard Enterprise Silicon Design Lab Ft Collins CO 80528 USA Hewlett Packard Enterprise Silicon Design Lab Plano TX USA Hewlett Packard Enterprise Silicon Design Lab Palo Alto CA USA Hewlett Packard Enterprise Silicon Design Lab Houston TX USA Hewlett Packard Enterprise Silicon Design Lab Heredia Costa Rica Hewlett Packard Enterprise Brazil Labs Barueri Brazil Hewlett Packard Enterprise Brazil Labs Porto Alegre RS Brazil Hewlett Packard Enterprise SSTO RnD Bangalore Karnataka India Hewlett Packard Enterprise Composable Engn Bangalore Karnataka India

ISBN: (纸本)9781728152219

Hybrid analog-digital neuromorphic accelerators show promise for significant increase in performance per watt of deep learning inference and training as compared with conventional technologies. In this work we present an FPGA demonstrator of a programmable hybrid inferencing accelerator, with memristor analog dot product engines emulated by digital matrix-vector multiplication units employing FPGA SRAM memory for in-situ weight storage. The full-chip demonstrator interfaced to a host by PCIe interface serves as a software development platform and a vehicle for further hardware microarchitecture improvements. Implementation of compute cores, tiles, network on a chip, and the host interface is discussed. New pipelining scheme is introduced to achieve high utilization of matrix-vector multiplication units while reducing tile data memory size requirements for neural network layer activations. The data flow orchestration between the tiles is described, controlled by a RISC-V core. Inferencing accuracy analysis is presented for an example RNN and CNN models. The demonstrator is instrumented with hardware monitors to enable performance measurements and tuning. Performance projections for future memristor-based ASIC are also discussed.

关键词： deep neural network inference neural network Acceleration Memristor

来源：评论

学校读者我要写书评

暂无评论

Motivation for and Evaluation of the First Tensor Processing Unit

引用

IEEE MICRO 2018年第3期38卷 10-19页

作者： Jouppi, Norman P. Young, Cliff Patil, Nishant Patterson, David Google Mountain View CA 94043 USA

The first-generation tensor processing unit (TPU) runs deep neural network (DNN) inference 15-30 times faster with 30-80 times better energy efficiency than contemporary CPUs and GPUs in similar semiconductor technologies. This domain-specific architecture (DSA) is a custom chip that has been deployed in Google datacenters since 2015, where it serves billions of people.

关键词： Computer Centres Coprocessors Feedforward neural Nets Parallel Architectures Tensors Motivation First Generation Tensor Processing Unit TPU DNN Domain Specific Architecture deep neural network inference Energy Efficiency Semiconductor Technologies Google Datacenters neural networks Graphics Processing Units Tensile Stress Central Processing Unit Energy Efficiency Data Centers Semiconductor Devices Microprocessors Microprocessor Tensor Processing Unit deep neural network GPU Machine Learning Hardware

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：