检索结果-内蒙古大学图书馆

FLQ: Design and implementation of hybrid multi-base full logarithmic quantization neural network acceleration architecture based on FPGA

引用

SIGNAL PROCESSING-IMAGE COMMUNICATION 2025年 134卷

作者： Zhang, Longlong Hu, Xiang Liao, Xuan Zhou, Tong Peng, Yuanxi Natl Univ Def Technol Coll Comp Sci & Technol Changsha 410073 Peoples R China Natl Univ Def Technol Beijing Inst Adv Study Beijing 100000 Peoples R China

As deep neural network (DNN) models become more accurate, problems such as large model parameters and high computational complexity have become increasingly prominent, leading to a bottleneck in deploying them on resource-limited embedded platforms. In recent years, logarithm-based quantization techniques have shown great potential in reducing the inference cost of neural networks. However, current single-model log-quantization has reached an upper limit of classification performance, and little work has investigated hardware implementation of neural network quantization. In this paper, we propose a full logarithmic quantization (FLQ) mechanism that quantizes both weights and activation values into the logarithmic domain, compressing the parameters on AlexNet and VGG16 model by >6.4 times while maintaining an accuracy loss of within 2.5 % compared with benchmarking. Furthermore, we propose two optimization solutions for FLQ: activation segmented full logarithmic quantization (ASFLQ) and multi-ratio activation segmented full logarithmic quantization (Multi-ASFLQ), which can better balance the numerical representation range and quantization step. Under the condition of weight quantization of 5 bits and activation value quantization of 4 bits, the optimization methods proposed in this paper can improve the TOP1 of the VGG16 network model by 1 % and 1.6 %, respectively. Subsequently, we propose an implementation scheme of computing unit corresponding to the optimized FLQ mechanism above, which can not only convert multiplication operations into a shift operation but also integrate functions such as different ratio logarithmic bases and sparsity processing for activation, minimizing resource consumption as well as avoiding unnecessary calculations. Finally, we experiment with VGG19, Retnet50, and Densenet169 models, proving that the proposed method can achieve good performance under lower bit quantization. (c) 2001 Elsevier Science. All rights reserved

关键词： Deep neural network Logarithmic quantization computing units FPGAs

来源：评论

学校读者我要写书评

暂无评论

Vehicular task scheduling strategy with resource matching computing in cloud-edge collaboration

IET COLLABORATIVE INTELLIGENT MANUFACTURING

引用

IET COLLABORATIVE INTELLIGENT MANUFACTURING 2021年第4期3卷 334-344页

作者： Hu, Fangyi Lv, Lingling Zhang, TongLiang Shi, Yanjun Dalian Univ Technol Dept Mech Engn Dalian Peoples R China

In future transportation, on board unit (OBU) is a key component of connected vehicles with limited computing resources, and may not tackle the heavy computing burden from V2X networks. For these cases, we herein employ multi-access edge cloud (MEC) and remote cloud to schedule the OBUs' tasks. This schedule tries to minimise the total completion time of all tasks and the number of computing units of the MEC server. We first introduce a multi-objective optimisation model considering the tasks and cloud-edge collaboration. Then, we propose a task scheduling strategy considering the resource matching degree for this model. In this strategy, we propose an improved hybrid genetic algorithm and employ the resource matching measure between the tasks and computing units in terms of computing, storage and network bandwidth resources to obtain better solutions for generations. The numerical results showed the effectiveness of our strategy.

关键词： vehicular task scheduling strategy genetic algorithms scheduling total completion time OBU V2X network cloud-edge collaboration computing resources improved hybrid genetic algorithm multiobjective optimisation model cloud computing minimisation remote cloud multiaccess edge cloud connected vehicles network bandwidth resources traffic engineering computing vehicular ad hoc networks computing units resource matching computing on board unit MEC server

来源：评论

学校读者我要写书评

暂无评论

Speed and Conversational Large Language Models: Not All Is About Tokens per Second

引用

COMPUTER 2024年第8期57卷 74-80页

作者： Conde, Javier Gonzalez, Miguel Reviriego, Pedro Gao, Zhen Liu, Shanshan Lombardi, Fabrizio Univ Politecn Madrid ETSI Telecomunicac Madrid 28040 Spain Tianjin Univ Tianjin 300072 Peoples R China Univ Elect Sci & Technol China Chengdu 611731 Sichuan Peoples R China Northeastern Univ Boston MA 02115 USA

The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

关键词： Large Language Models Chatbots Artificial Intelligence User Interfaces Business Open Source Software Source Coding Performance Evaluation Costs Investment Modeling Large Language Models Benchmark Application Programming Interface Speed Rate Wide Range Of Topics computing units Speed Of The Model User Agreement Paraphrase Multiple Choice Questions GB Memory Results Of Task Input Text Text Generation Input Tokens Tokenized

来源：评论

学校读者我要写书评

暂无评论

An optimized encoding algorithm for systematic polar codes

引用

EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING 2019年第1期2019卷 1-12页

作者： Wang, Xiumin Zhang, Zhihong Li, Jun Wang, Yu Cao, Haiyan Li, Zhengquan Shan, Liang China Jiliang Univ Coll Informat Engn Hangzhou 310018 Zhejiang Peoples R China Nanjing Univ Informat Sci & Technol Binjiang Coll Wuxi 214105 Jiangsu Peoples R China New York City Coll Technol Dept Comp Engn Technol Brooklyn NY 11201 USA Hangzhou Dianzi Univ Coll Commun Engn Hangzhou 310018 Zhejiang Peoples R China Jiangnan Univ Sch IoT Engn Wuxi 214122 Jiangsu Peoples R China

Many different encoding algorithms for systematic polar codes (SPC) have been introduced since SPC was proposed in 2011. However, the number of the computing units of exclusive OR (XOR) has not been optimized yet. According to an iterative property of the generator matrix and particular lower triangular structure of the matrix, we propose an optimized encoding algorithm (OEA) of SPC that can reduce the number of XOR computing units compared with existing non-recursive algorithms. We also prove that this property of the generator matrix could extend to different code lengths and rates of the polar codes. Through the matrix segmentation and transformation, we obtain a submatrix with all zero elements to save computation resources. The proportion of zero elements in the matrix can reach up to 58.5% from the OEA for SPC when the code length and code rate are 2048 and 0.5, respectively. Furthermore, the proposed OEA is beneficial to hardware implementation compared with the existing recursive algorithms in which signals are transmitted bidirectionally.

关键词： Encoding algorithm Low complexity computing units Systematic polar codes Generator matrix transformation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：