Efficient inference of deep learning models are challenging and of great value in both academic and industrial community. In this paper, we focus on exploiting the sparsity in input data to improve the performance of ...
详细信息
In ophthalmology, early fundus screening is an economic and effective way to prevent blindness caused by ophthalmic diseases. Clinically, due to the lack of medical resources, manual diagnosis is time-consuming and ma...
详细信息
The training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of DNN accelerators. Characterizing the memory behaviors of DNN training is critical to optimize the devic...
详细信息
The training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of DNN accelerators. Characterizing the memory behaviors of DNN training is critical to optimize the device memory pressures. In this work, we pinpoint the memory behaviors of each device memory block of GPU during training by instrumenting the memory allocators of the runtime system. Our results show that the memory access patterns of device memory blocks are stable and follow an iterative fashion. These observations are useful for the future optimization of memory-efficient training from the perspective of raw memory access patterns.
The training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of DNN accelerators. Characterizing the memory behaviors of DNN training is critical to optimize the devic...
详细信息
With the growing ubiquity of the Internet of Things, in-the-edge inference of deep neural network models has been a major driver for promoting the widespread use of intelligent applications. As model inference charact...
详细信息
With the growing ubiquity of the Internet of Things, in-the-edge inference of deep neural network models has been a major driver for promoting the widespread use of intelligent applications. As model inference characteristics are crucial for optimizing and deploying deep neural networks on hardware platforms, many studies focus on analyzing the performance of neural networks such as latency, accuracy, throughput, and energy consumption. However, few existing works have ever discussed the runtime overheads hidden in neural network inference, despite the overheads are non-negligible for edge applications. The lack of in-depth analysis of the overheads hinders the understanding of how hardware designs and model structures impact on-device inference performance. In this paper, we characterize the runtime overheads of deep learning inference on representative edge devices by leveraging state-of-the-art neural network models, performing a systematical analysis from the perspective of end-to-end performance, hardware platforms, memory bandwidth, and neural network model structures. Based on experimental results, the crucial insights are offered to facilitate the design and configure of resource-efficient networks and pick appropriate models on the specific platform, which provides a comprehensive view of runtime overheads of in-the- edge neural network inference for architects and developers.
Register renaming is the key for the performance of out-of-order processors. However, the release mechanism of the physical register may cause a waste from time dimension. The register reuse technique is the earliest ...
详细信息
ISBN:
(数字)9781450379991
ISBN:
(纸本)9781728180571
Register renaming is the key for the performance of out-of-order processors. However, the release mechanism of the physical register may cause a waste from time dimension. The register reuse technique is the earliest solution to release a physical register at renaming stage, which takes the advantage of those register instances with only one time use. However, the range of possible reuse mined by this scheme is not high, and the physical structure of the register have to be modified. Aiming at these two problems, we propose an extended register reuse scheme. Our work presents: 1) prediction of the use times of the register instance, so as to reuse the physical registers at the end of the last use, to expand the range of possible reuse. 2) A design of time-sharing register file with little overheads which is implemented by Backup Registers, avoiding to modify the physical register structure. Compared with the original register reuse technique, this work achieves 8.5% performance improvement, alternatively, 9.6% decrease of the number of physical registers with minor hardware overhead.
The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light c...
详细信息
Due to the mobility and frequent disconnections, the correctness of mobile interaction systems, such as mobile robot systems and mobile payment systems, are often difficult to analyze. This paper introduces three crit...
详细信息
The purpose of this research paper is to implement the tooth target detection task by deep convolutional neural networks. In order to solve the problems of low accuracy of target detection due to the high similarity b...
详细信息
The purpose of this research paper is to implement the tooth target detection task by deep convolutional neural networks. In order to solve the problems of low accuracy of target detection due to the high similarity between teeth and complex tooth textures, an improved tooth detection method with YOLOv3-SPP model is proposed in this paper. This method incorporates the convolutional block attention module (CBAM) in the YOLOv3-SPP algorithm framework, and improves the performance of the network features by adding the channel attention mechanism and spatial attention mechanism to the feature extraction network to enhance the saliency of the tooth target region in the image. Secondly, CIoU border regression loss is introduced to improve the localization accuracy. In addition, a Non-maximum suppression (NMS) method is used to solve the candidate frame overlap problem. Through experiments, it is shown that the mAP of the modified YOLOv3-SPP target detection model is improved to 86.8 percent, which indicates that the improved model can be applied to the detection of teeth.
Drawing support from an effective Medical Image Segmentation (MIS) is conducive to a substantial diagnostic basis for the physicians to identify the focus lesion in the patient body and give the subsequent clinical as...
详细信息
暂无评论