the recent research advances in deep learning have led to the development of small and powerful Convolutional Neural Network (CNN) architectures. Meanwhile Field Programmable Gate Arrays (FPGAs) has become a popular h...
详细信息
ISBN:
(纸本)9781665414852
the recent research advances in deep learning have led to the development of small and powerful Convolutional Neural Network (CNN) architectures. Meanwhile Field Programmable Gate Arrays (FPGAs) has become a popular hardware target choice for their deployment, splitting into two main implementation categories: streaming hardware architectures and single computation engine design approaches. the streaming hardware architectures generally require implementing every layer as a discrete processing unit, and are suitable for smaller software models that could fit in their unfolded versions into resource-constrained targets. On the other hand, single computation engines can be scaled to fit into a device to execute CNN models of different sizes and complexities, however, the achievable performance of one-size-fits-all implementations may vary across CNNs with different workload attributes leading to inefficient utilization of hardware resources. By combing the advantages of both of the above methods, this work proposes a new design paradigm called semi-streaming architecture, where layer-specialized configurable engines are used for network realization. As a proof of concept this paper presents a set of five layer-specialized configurable processing engines for implementing 8-bit quantized MobilenevV2 CNN model. the engines are chained to partially preserve data streaming and tuned individually to efficiently process specific types of layers: normalized addition of residuals, depthwise, pointwise (expansion and projection), and standard 2D convolution layers capable of delivering 5.4GOp/s, 16GOp/s, 27.2GOp/s, 27.2GOp/s and 89.6GOp/s, respectively, withthe overall energy efficiency of 5.32GOp/s/W at a 100MHz system clock, requiring total power of 6.2W on a XCZU7EV SoC FPGA.
Data movement between host and accelerators is one of the most challenging aspects of developing applications for heterogeneous systems. Most existing runtime systems for GPGPU programming require developers to perfor...
详细信息
the proceedings contain 110 papers. the topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed mu...
ISBN:
(纸本)9781728168760
the proceedings contain 110 papers. the topics discussed include: SSDKeeper: self-adapting channel allocation to improve the performance of SSD devices;a study of graph analytics for massive datasets on distributed multi-GPUs;DPF-ECC: accelerating elliptic curve cryptography with floating-point computing power of GPUs;inter-job scheduling of high-throughput material screening applications;learning an effective charging scheme for mobile devices;improving transactional code generation via variable annotation and barrier elision;solving the container explosion problem for distributed high throughput computing;CycLedger: a scalable and secure parallel protocol for distributed ledger via sharding;DAG-aware joint task scheduling and cache management in spark clusters;and understanding the interplay between hardware errors and user job characteristics on the Titan supercomputer.
Deep learning especially image recognition techniques have been extensively used in various applications such as unmanned driving. the robustness of deep learning models is of critical importance since fault image rec...
详细信息
ISBN:
(纸本)9781665414852
Deep learning especially image recognition techniques have been extensively used in various applications such as unmanned driving. the robustness of deep learning models is of critical importance since fault image recognition results may result in serious incidents. In this paper, we propose a fast quantifying method by general image processing to evaluate the robustness of five typical deep learning models under the Keras framework (i.e., VGG16, InceptionV3, ResNet50, DenseNet, and MobileNet). We analyze six metrics in terms of accuracy, precision, recall, F1, recognition time, and impact factor. the evaluation data is publicly accessible image data sets from Kaggle. In our evaluation, the adversary samples are generated by generally image processing methods such as gray-scaling, color-reversing, and image-flipping, which is ordinary operations and easily launched. the different models over various image processing methods are evaluated and compared comprehensively. the evaluation results show that DenseNet performs best over three conditions such as baseline, gray-scaling and horizontal flipping. MobileNet costs the shortest delay in decision over all image processing methods. F1 score varies with different attack intensity. InceptionV3 presents overall robustness in most conditions.
暂无评论