检索结果-内蒙古大学图书馆

On Stochastic Roundoff Errors in Gradient Descent with low-precision computation

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS 2024年第2期200卷 634-668页

作者： Xia, Lu Massei, Stefano Hochstenbach, Michiel E. Koren, Barry Eindhoven Univ Technol NL-5600 MB Eindhoven Netherlands Univ Pisa I-56127 Pisa Italy

When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.

关键词： Gradient descent method Stochastic roundoff error analysis low-precision computation Convergence analysis Logistic regression Neural networks

来源：评论

学校读者我要写书评

暂无评论

ABS: Accumulation Bit-Width Scaling Method for Designing low-precision Tensor Core

引用

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 2024年第9期32卷 1590-1601页

作者： Cao, Yasong Wen, Mei Luo, Zhongdi Ju, Xin Huang, Haolan Shen, Junzhong Chen, Haiyan Natl Univ Def Technol Coll Comp Changsha 410073 Peoples R China Hunan Univ Chinese Med Dept Software Engn Changsha 410073 Peoples R China

A big gap exists between deep neural network (DNN) applications' computational demand and the computing power of DNN accelerators. low-precision floating-point (LP-FP) computation is one of the important means to improve the performance of DNN training and inference. However, the high-precision accumulators are typically applied to summating the dot products during general matrix multiplication (GEMM) in tensor cores (TCs). As the precision of data decreases, the accumulator becomes the main consumer of multiply-accumulate's (MAC's) area and power. Reducing the accumulators' bit-width is of significant importance for improving the area-and energy-efficiency of TCs. There are two main challenges: 1) theoretical support on the floating-point (FP) formats with the lowest bit-width of TC's accumulators and 2) how to integrate the LP-FP TC in the framework of DNN training and inference to evaluate its benefits. In this article, we propose accumulation bit-width scaling (ABS), a novel ABS method, to guide the design of LP-FP TCs. We 1) implement this method by constructing a novel variance retention ratio (VRR) model to predict the FP format with the minimum bit-width for TC's accumulator;2) provide a generator of DNN accelerator based on a systolic-array (SA) TC, supporting many low-precision configurations;and 3) design an LP-FP DNN executing framework that supports software-simulation mode and hardware-accelerator mode to run LP-FP DNN tasks. The experimental results show that the LP-FP TC guided by our ABS method has a maximum reduction of 76.47% and 75.60% in area and power consumption, respectively, compared with the advanced TCs.

关键词： Artificial neural networks Training Task analysis Quantization (signal) Tensors Hardware Systolic arrays Accumulation length floating-point (FP) low-precision computation systolic array (SA) variance retention ratio (VRR)

来源：评论

学校读者我要写书评

暂无评论

Customizing computational Methods for Visual Analytics with Big Data

引用

IEEE COMPUTER GRAPHICS AND APPLICATIONS 2013年第4期33卷 22-28页

作者： Choo, Jaegul Park, Haesun Georgia Tech Computat Sci & Engn Atlanta GA USA Georgia Tech Sch Computat Sci & Engn Atlanta GA USA Georgia Tech Fdn Data & Visual Analyt Ctr Atlanta GA USA Georgia Tech Ctr Data Analyt Atlanta GA USA

The volume of available data has been growing exponentially, increasing data problem's complexity and obscurity. In response, visual analytics (VA) has gained attention, yet its solutions haven't scaled well for big data. computational methods can improve VA's scalability by giving users compact, meaningful information about the input data. However, the significant computation time these methods require hinders real-time interactive visualization of big data. By addressing crucial discrepancies between these methods and VA regarding precision and convergence, researchers have proposed ways to customize them for VA. These approaches, which include low-precision computation and iteration-level interactive visualization, ensure real-time interactive VA for big data.

关键词： big data clustering computer graphics dimension reduction iteration-level visualization large-scale data low-precision computation visual analytics computational methods Visual analytics COMPUTER GRAPHICS Customizing big data

来源：评论

学校读者我要写书评

暂无评论

Fast Convolution Meets low precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs

引用

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 2024年第1期21卷 1-26页

作者： Wang, Xueying Li, Guangli Jia, Zhen Feng, Xiaobing Wang, Yida Beijing Univ Posts & Telecommun Beijing Peoples R China Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Amazon Web Serv Seattle WA USA Beijing Univ Posts & Telecommun 10 Xitucheng Rd Beijing 100876 Peoples R China Chinese Acad Sci Inst Comp Technol 6 Kexueyuan South Rd Beijing 100190 Peoples R China Univ Chinese Acad Sci 6 Kexueyuan South Rd Beijing 100190 Peoples R China AmazonWeb Serv 2795 Augustine Dr Santa Clara CA 95054 USA

low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this article, we propose an effective quantizedWinograd convolution, named lowino, which employs an in-side quantization method in theWinograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate lowino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84x and 1.91x operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level.

关键词： Deep learning winograd convolution low-precision computation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：