Real-time object detection demands high throughput and low latency, necessitating the use of hardware accelerators. NPU is specialized hardware designed to accelerate the calculation of deep learning models, providing...
详细信息
Real-time object detection demands high throughput and low latency, necessitating the use of hardware accelerators. NPU is specialized hardware designed to accelerate the calculation of deep learning models, providing better energy efficiency and parallel processing performance than existing CPUs or GPUs. In particular, it plays an important role in reducing latency and improving processing speed in applications that require real-time processing. In this paper, we construct a real-time object detection system based on YOLOv3, utilizing Neubla's Antara NPU, and propose two approaches for performance optimization. First, we ensure the continuity of NPU inference by allowing the CPU to process data in advance through double buffering. Second, in a multi-NPU environment, we distribute tasks among NPUs through queue-based processing and analyze the performance limits using Amdahl's law. Experimental results demonstrate that compared to a CPU-only environment, applying the NPU in single buffering improved throughput by 2.13 times, double buffering by 3.35 times, and in a multi-NPU environment by 4.81 times. Latency decreased by 1.6 times in single and double buffering, and by 1.18 times in the multi-NPU environment. The accuracy remained consistent, with 31.4 mAP on the CPU and 31.8 mAP on the NPU.
This paper addresses two issues related to motion estimation using the block matching algorithms (BMA): (1) determining the reliability of the motion vectors of each block, and (2) imposing smoothness constraint to th...
详细信息
ISBN:
(纸本)0819452114
This paper addresses two issues related to motion estimation using the block matching algorithms (BMA): (1) determining the reliability of the motion vectors of each block, and (2) imposing smoothness constraint to the motion vector field. We introduce a new robust reliability measure to represent the confidence level of the motion vector from the cost function distribution and propose a novel algorithm that incorporates smoothness constraint into the motion vector field evaluation by implementing a priority queue structure based on the reliability measure. In this framework, a smooth motion vector field is evaluated in a single pass without going through iterations typical of many existing optical flow estimation algorithms. Hence it is fast and can easily be incorporated into real-time applications for video compression as well as image segmentation.
暂无评论