In order to solve the problems of low model accuracy, poor computing power, poor parallel ability and excessive power consumption in the deployment of RGBD based 3D target detection model at the embedded end, this pap...
详细信息
ISBN:
(纸本)9781665478960
In order to solve the problems of low model accuracy, poor computing power, poor parallel ability and excessive power consumption in the deployment of RGBD based 3D target detection model at the embedded end, this paper first proposes an improved RGBD 3D target detection model based on ENet semantic segmentation model, which takes ENet as the semantic segmentation network, RGB image and depth information are fused to realize 3D target detection. Secondly, in order to apply the model at the edge, this paper constructs a lightweight network and cuts the network in the down-sampling stage of ENet model. Finally, this paper uses Xilinx ZCU104 as the hardware development kit, which takes FPGA as the auxiliary parallel operation unit and ARM as the main operation unit. It is a heterogeneous computing architecture with the ability to deal with complex operations. The architecture uses FPGA to accelerate the depth model in parallel, which improves the operation speed and reduces the power consumption. The test results of the model on ZCU104 are compared with other hardware. The results show that while ensuring the accuracy, the power consumption of the heterogeneous computing architecture used in this paper is 93% lower than that of Intel Xeon e5-2620 v4 CPU, the speed is 12 times higher, and the speed is more than 180 times higher than that of ARM Cortex-A53 commonly used at the edge.
In order to solve the problems of low model accuracy,poor computing power,poor parallel ability and excessive power consumption in the deployment of RGBD based 3 D target detection model at the embedded end,this paper...
详细信息
In order to solve the problems of low model accuracy,poor computing power,poor parallel ability and excessive power consumption in the deployment of RGBD based 3 D target detection model at the embedded end,this paper first proposes an improved RGBD 3 D target detection model based on ENet semantic segmentation model,which takes ENet as the semantic segmentation network,RGB image and depth information are fused to realize 3 D target ***,in order to apply the model at the edge,this paper constructs a lightweight network and cuts the network in the down-sampling stage of ENet ***,this paper uses Xilinx ZCU104 as the hardware development kit,which takes FPGA as the auxiliary parallel operation unit and ARM as the main operation *** is a heterogeneous computing architecture with the ability to deal with complex *** architecture uses FPGA to accelerate the depth model in parallel,which improves the operation speed and reduces the power *** test results of the model on ZCU104 are compared with other *** results show that while ensuring the accuracy,the power consumption of the heterogeneous computing architecture used in this paper is 93% lowerthan that of Intel Xeon e5-2620 v4 CPU,the speed is 12 times higher,and the speed is more than 180 times higher than that of ARM Cortex-A53 commonly used at the edge.
Memristor-based accelerator (MBA) has demonstrated its capability in accelerating matrix-vector multiplication (MVM) with high performance and energy efficiency. However, it is hard to determine whether and how well a...
详细信息
Memristor-based accelerator (MBA) has demonstrated its capability in accelerating matrix-vector multiplication (MVM) with high performance and energy efficiency. However, it is hard to determine whether and how well an application can benefit from MBAs in a heterogeneous computing architecture. In this article, we propose a simulation framework called MHSim to evaluate the energy efficiency and performance of applications running with both MBAs and CPUs. MHSim provides flexible system-level interfaces and circuit-level simulation models for designers to configure heterogeneous computing architectures. We design a general-purpose MBA which enables floating-point computation models for general matrix-matrix multiplication (GEMM). Our simulation framework can quantify the performance and energy efficiency of different MBA architectures for various applications. We validate our simulation framework with SPICE and evaluate the accuracy and performance of MBAs via several case studies. Experimental results demonstrate that the deviations of energy consumption and latency are only 0.47% and 0.49% on average compared with SPICE-based simulation.
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory computing (IMC) using non-volatile memory (NVM) promises major efficiency impr...
详细信息
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC's functional flexibility limitations and their impact on performance, energy, and area efficiency are not yet fully understood at the system level. To target practical end-to-end loT applications, IMC arrays must be enclosed in heterogeneous programmable systems, introducing new system-level challenges which we aim at addressing in this work. We present a heterogeneous tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators. We benchmark the system on a highly heterogeneous workload such as the Bottleneck layer from a MobileNetV2, showing 11.5x performance and 9.5x energy efficiency improvements, compared to highly optimized parallel execution on the cores. Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneousarchitecture to a multi-array accelerator. Our results show that our solution, on the end-to-end inference of the MobileNetV2, is one order of magnitude better in terms of execution latency than existing programmable architectures and two orders of magnitude better than state-of-the-art heterogeneous solutions integrating in-memory computing analog cores.
In modern online multi-player games, game providers are struggling to keep up with the many different types of cheating. Cheat detection is a task that requires a lot of computational resources. Advances made within t...
详细信息
ISBN:
(纸本)9781424483556
In modern online multi-player games, game providers are struggling to keep up with the many different types of cheating. Cheat detection is a task that requires a lot of computational resources. Advances made within the field of heterogeneous computing architectures, such as graphics processing units (GPUs), have given developers easier access to considerably more computational resources, enabling a new approach to solving this issue. In this paper, we have developed a small game simulator that includes a customizable physics engine and a cheat detection mechanism that checks the physical model used by the game. To make sure that the mechanisms are fair to all players, they are executed on the server side of the game system. We investigate the advantages of implementing physics cheat detection mechanisms on a GPU using the Nvidia CUDA framework, and we compare the GPU implementation of the cheat detection mechanism with a CPU implementation. The results obtained from the simulations show that offloading the cheat detection mechanisms to the GPU reduces the time spent on cheat detection, enabling the servers to support a larger number of clients.
暂无评论