Deep Reinforcement Learning (DRL) is substantially resource-consuming, and it requires large-scale distributed computing-nodes to learn complicated tasks, like video-game and Go play. This work attempts to down-scale ...
详细信息
ISBN:
(数字)9781665423243
Deep Reinforcement Learning (DRL) is substantially resource-consuming, and it requires large-scale distributed computing-nodes to learn complicated tasks, like video-game and Go play. This work attempts to down-scale a distributed DRL system into a specialized many-core chip and achieve energy-efficient on-chip DRL. With the customized Network-on-Chip that handles the communication of on-chip data and control-signals, we proposed a Synchronous Asynchronous RL architecture (SARLA) and the according many-core chip that completely avoids the unnecessary data duplication and synchronization activities in multi-node RL systems. In evaluation, the SARLA system achieves considerable energy-efficiency boost over the GPU-based implementations for typical DRL workloads built with OpenAI-gym.
The browser is one of the most commonly used applications. Users tend to pursue a good user experience and care more about the performance of the browser, while ignoring the power consumption of the browser. This pape...
详细信息
It's been a big year for low-energy nuclear reactions. LENRs, as they're known, are a fringe research topic that some physicists think could explain the results of an infamous experiment nearly 30 years ago th...
It's been a big year for low-energy nuclear reactions. LENRs, as they're known, are a fringe research topic that some physicists think could explain the results of an infamous experiment nearly 30 years ago that formed the basis for the idea of cold fusion. That idea didn't hold up, and only a handful of researchers around the world have continued trying to understand the mysterious nature of the inconsistent, heat-generating reactions that had spurred those claims.
In a hierarchically-structured cloud/edge/device computing environment, workload allocation can greatly affect the overall system performance. This paper deals with AI-oriented medical workload generated in emergency ...
详细信息
Convolution neural networks (CNNs) have been widely used in many applications. Field-Programmable Gate Array (FPGA) based accelerator is an ideal solution for CNNs in embedded systems. However, the single event upset ...
详细信息
ISBN:
(数字)9781728149226
ISBN:
(纸本)9781728149233
Convolution neural networks (CNNs) have been widely used in many applications. Field-Programmable Gate Array (FPGA) based accelerator is an ideal solution for CNNs in embedded systems. However, the single event upset (SEU) effect in FPGA device may have a significant influence on the performance of CNNs. In this paper, we analyze the sensibility of CNNs to SEU and present a fault-tolerant design for CNN accelerators. First, we find that SEU in processing elements (PEs) has the worst effects on CNNs since it produces proportional errors and will not get refreshed. Furthermore, it is indicated that the large positive perturbation contributes almost all of the performance loss. Based on such observations, we propose an error detecting scheme to locate incorrect PEs and give an error masking method to achieve fault-tolerance. Experiments demonstrate that the proposed method achieves similar fault-tolerant performance with the triple modular redundancy (TMR) scheme while the overhead is much lower than it.
In this paper, we propose an enhanced handover scheme for cellular-connected UAVs. Specifically, our handover scheme considers the following characteristics: 1) UAV can detect multiple cells with the comparable RSRP l...
详细信息
ISBN:
(数字)9781728173276
ISBN:
(纸本)9781728173283
In this paper, we propose an enhanced handover scheme for cellular-connected UAVs. Specifically, our handover scheme considers the following characteristics: 1) UAV can detect multiple cells with the comparable RSRP levels which may cause many unnecessary handovers. The handover event trigger parameters in our scheme are dynamically adjusted to avoid a UAV to handover from a cell to another cell with the comparable RSRP level; 2)In the process of taking off, the UAV would fly through the null space of antenna lobes many times, while the time duration is normally very short. The RSRP during the UAV taking off varies quickly, so that the measurement reports may not provide an accurate channel information for the UAV. In this case, when the link quality between the UAV and the BS is below a threshold, the BS allows the link being maintained for a while with the hope that the link quality would get better again. We implement our proposed handover scheme on the NS3 platform, and compare with the current LTE handover scheme and the sojourn time estimation-based handover algorithm. Our simulation results demonstrate that our proposed scheme can significantly reduce the number of unnecessary handovers. Moreover, the network throughput of our scheme is improved, since the the communication resources taken by the unnecessary handovers is utilized by the UAV for transmitting data.
Due to the short wavelength of millimeter wave (mmWave) and high directional beamforming, the massive MIMO systems are highly vulnerable to link blockage. Beam switching to unblocked direction is an effective solution...
详细信息
ISBN:
(数字)9781728173276
ISBN:
(纸本)9781728173283
Due to the short wavelength of millimeter wave (mmWave) and high directional beamforming, the massive MIMO systems are highly vulnerable to link blockage. Beam switching to unblocked direction is an effective solution to overcome blockage and restore communication links. To this end, a set of candidate beams for beam switching should be selected before the beam is blocked. However, due to the high speed movement of the UAV, identifying the appropriate beam for an UAV with any position is not trivial. In this work, a fast link recovery approach is proposed. Specifically, our proposed beam selection method considers the spatial correlation, estimated reliability probability of the beams and signal quality. The simulation results show that the proposed method can efficiently recover the interrupted link, and the outage probability is almost reduced to 0% in the scene where the UAV moves at high speed.
In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communicati...
详细信息
ISBN:
(数字)9781728161495
ISBN:
(纸本)9781728161501
In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87%.
This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call clos...
详细信息
Recent advances in machine learning, wireless communication, and mobile hardware technologies promisingly enable federated learning (FL) over massive mobile edge devices, which opens new horizons for numerous intellig...
详细信息
暂无评论