检索结果-内蒙古大学图书馆

Optimized RT-DETR for accurate and efficient video object detection via decoupled feature aggregation

INTERNATIONAL JOURNAL OF MULtimeDIA INFORMATION RETRIEVAL 2025年第1期14卷 1-13页

作者： Chen, Hao Huang, Wu Zhang, Tao Chengdu Techman Software Co Ltd Chengdu 610000 Peoples R China Sichuan Univ Sch Comp Sci Chengdu 610000 Peoples R China

video object detection (VOD) is a challenging task, and image object detectors are difficult to detect degradation phenomena in certain video frames. However, existing research on VOD mostly trades high computational costs for accuracy, making it difficult to achieve a balance between accuracy and speed. This work proposes an optimized real-time Detection Transformer (RT-DETR) model for VOD that introduces a decoupled Feature Aggregation Module (FAM) to separately refine the localization and classification detection heads. This method only requires a minimal increase in the number of parameters to achieve significant improvements in accuracy. Specifically, we insert FAM before the localization detection head and classification detection head, and first freeze all parameters of the feature extractor and classification detection head to train only the parameters of the localization detection head to obtain more accurate localization results. Then, we freeze all parameters of the feature extractor and localization detection head to train only the parameters of the classification detection head to improve the final detection accuracy. We have conducted a large number of ablation experiments to verify the effectiveness of the method. Without using any post-processing methods, we achieved 90.0% mAP on the imageNet-VID dataset, with only 77.9 M parameters and an average inference speed of 14.1ms.

关键词： video object detection RT-DETR Feature aggregation module imageNet-VID

来源：评论

学校读者我要写书评

暂无评论

real time Parking Space Detection System Using A Pixel-Based processing Technique 8

Real Time Parking Space Detection System Using A Pixel-Based...

引用

8th International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2024

作者： Sener, Berkay Dincer, Serife Esra Istanbul Gedik University Istanbul Turkey Istanbul Gedik University Software Engineering Department Istanbul Turkey

ISBN: (纸本)9798350354423

This study proposes a real-time parking space detection using a pixel-based image processing technique. The proposed algorithm detects free spaces in a busy parking place video stream. The performance of the algorithm is compared with CNN algorithm using YOLOv8 software. The detection rate of the developed algorithm is found to be 9% higher than the YOLOv8 classification model with 5 times less error rate. Also, the proposed system missed fewer objects and requires less memory space than the YOLOv8 model. © 2024 IEEE.

关键词： image processing

来源：评论

学校读者我要写书评

暂无评论

DUALFEAT: DUAL FEATURE AGGREGATION FOR video OBJECT DETECTION 29

DUALFEAT: DUAL FEATURE AGGREGATION FOR VIDEO OBJECT DETECTIO...

引用

IEEE International conference on image processing (ICIP)

作者： Pan, Jing Du, Kaiwen Yan, Yan Wang, Hanzi Xiamen Univ Fujian Key Lab Sensing & Comp Smart City Sch Informat Xiamen Peoples R China

ISBN: (数字)9781665496209

ISBN: (纸本)9781665496209

video object detection aims to detect and track each object in a given video. However, due to the problem of appearance deterioration in the video, it is still challenging to obtain good results when we apply traditional image object detection methods to videos. In this paper, we propose a new feature aggregation method, called Dual Feature Aggregation (DualFeat) for video object detection. By effectively combining the temporal and spatial attention mechanisms, we make full use of the temporal and spatial information in videos. Meanwhile, we leverage a real-time tracker to track detected objects in video frames, where features are aggregated again with previously obtained features. Such a way helps to obtain more comprehensive and richer features, greatly improving the accuracy of video object detection. We perform experiments on the ILSVRC2017 dataset, and the experimental results also verify the effectiveness of our method.

关键词： video object detection deep learning attention mechanism object tracking

来源：评论

学校读者我要写书评

暂无评论

Frame-Adaptive Multi-Object Tracking for Discontinuous image Sequences 2

Frame-Adaptive Multi-Object Tracking for Discontinuous Image...

引用

2nd International conference on Algorithm, image processing and Machine Vision, AIPMV 2024

作者： Li, Xiaolei Xiao, Xingjie Yang, Siyuan Sha, Moquan Sha, Zongyao Tu, Jianguang School of Remote Sensing and Information Engineering Wuhan University Wuhan China Wuhan Luojia Zhongheng Remote Sensing Data Technology Co Wuhan China China Unicom Smart City Research Institute Beijing China

ISBN: (纸本)9798350390254

In the field of multi-object tracking, this study introduces an innovative framework designed to address the challenges posed by frame loss in image sequences, particularly within the contexts of video surveillance and traffic management. Despite the significant achievements of deep learning-based tracking methods in processing high frame rate videos, frame loss remains a prevalent issue in real-world deployments due to network and hardware constraints, which can result in erroneous target associations and interrupted trajectory tracking. Our research begins by detecting frame loss through a timestamp- based analysis, this is followed by the deployment of an adaptive algorithm that fuses spatial and appearance metrics to compensate for the positional discrepancies and visual inconsistencies caused by missing frames, thus improving the precision of data association. We have curated a dataset that accurately reflects real-world environments, and conducted validation on subsampled versions of the MOT17 and MOT20 datasets, in addition to our original dataset. Empirical results indicate that our multi-object tracking approach demonstrates significant adaptability and robustness in handling discontinuous image sequences, reducing the number of ID switches by 54.1 % on our original dataset compared to state-of-the-art methods. The proposed framework has the potential to significantly enhance the performance of multi-object tracking systems in real-world scenarios where frame loss is a common occurrence, thereby improving the reliability and accuracy of video surveillance and traffic management applications. © 2024 IEEE.

关键词： Adaptive algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast detection of bag-breakups in pulsating and steady airflow using video analysis and deep learning

引用

JOURNAL OF real-time image processing 2023年第6期20卷 114页

作者： Morita, Daiki Raytchev, Bisser Elhanashi, Abdussalam Kawaguchi, Mikimasa Ogata, Yoichi Higaki, Toru Kaneda, Kazufumi Nakashima, Akira Saponara, Sergio Hiroshima Univ Grad Sch Adv Sci & Engn Higashihiroshima Japan Univ Pisa Dip Ingn Informaz Pisa Italy Mazda Motor Corp MBD Innovat Dept Fuchu Japan

Object detection methods based on deep learning have made great progress in recent years and have been used successfully in many different applications. However, since they have been evaluated predominantly on datasets of natural images, it is still unclear how accurate and effective they can be if used in special domain applications, for example in scientific, industrial, etc. images, where the properties of the images are very different from those taken in natural scenes. In this study, we illustrate the challenges one needs to face in such a setting on a concrete practical application, involving the detection of a particular fluid phenomenon-bag-breakup-in images of droplet scattering, which differ significantly from natural images. Using two technologically mature and state-of-the-art object detection methods, RetinaNet and YOLOv7, we discuss what strategies need to be considered in this problem setting, and perform both quantitative and qualitative evaluations to study their effects. Additionally, we also propose a new method to further improve accuracy of detection by utilizing information from several consecutive frames. We hope that the practical insights gained in this study can be of use to other researchers and practitioners when targeting applications where the images differ greatly from natural images.

关键词： Object detection Scientific and industrial applications real-time processing Small-size datasets YOLOv7 RetinaNet

来源：评论

学校读者我要写书评

暂无评论

Accelerating Convolutional processing by Harnessing Channel Shifts in Arrayed Waveguide Gratings

引用

LASER & PHOTONICS REVIEWS 2025年第1期19卷

作者： Yi, Dan Zhao, Caiyue Zhang, Zunyue Xu, Hongnan Tsang, Hon Ki Chinese Univ Hong Kong Dept Elect Engn Shatin Hong Kong Peoples R China Tianjin Univ Sch Precis Instrument & Optoelect Engn Tianjin 300072 Peoples R China

Convolutional neural networks are a powerful category of artificial neural networks that can extract features from raw data to provide greatly reduced parametric complexity and enhance pattern recognition and the accuracy of prediction. Optical neural networks offer the promise of dramatically accelerating computing speed while maintaining low power consumption even when using high-speed data streams running at hundreds of gigabit/s. Here, we propose an optical convolutional processor (CP) that leverages the spectral response of an arrayed waveguide grating (AWG) to enhance convolution speed by eliminating the need for repetitive element-wise multiplication. Our design features a balanced AWG configuration, enabling both positive and negative weightings essential for convolutional kernels. A proof-of-concept demonstration of an 8-bit resolution processor is experimentally implemented using a pair of AWGs with a broadband Mach-Zehnder interferometer (MZI) designed to achieve uniform weighting across the whole spectrum. Experimental results demonstrate the CP's effectiveness in edge detection and achieved 96% accuracy in a convolutional neural network for MNIST recognition. This approach can be extended to other common operations, such as pooling and deconvolution in Generative Adversarial Networks. It is also scalable to more complex networks, making it suitable for applications like autonomous vehicles and real-time video recognition. A novel convolutional processor is proposed using the shifted spectral response of a pair of arrayed waveguide gratings (AWGs) to mimic the kernel shifts during image convolution. This inherent mixing of inputs in the AWG's spectral response eliminates the need for repetitive element-wise computations while enabling the simultaneous generation of convolved output maps. image

关键词： array waveguide grating convolutional processor deep learning image processing optical computing

来源：评论

学校读者我要写书评

暂无评论

real-time Panoramic video Stitching Software

Real-Time Panoramic Video Stitching Software

引用

作者： Tasanen, Saku Tampere University

学位级别：硕士

Typically, video production requires a lot of manual labor. Sport events can be several hours long and during that time the camera equipment requires an operator. The cost of labor and large amount of time spent on video production increases demand for automated video production solutions. However, automatic video production is not a simple task. Sport venues are typically quite large and thus capturing the whole venue with a single static camera is often not possible. A solution to increasing the field of view is to create a panoramic video using multiple synchronized cameras. The main problem with this approach is that panorama stitching is computationally expensive. Another challenge is creating natural looking panoramas. images captured using different cameras can have color or exposure differences and final panoramas might have visible seams or alignment errors. Although image stitching methods have been known for a long time, software solutions capable of creating high-quality panorama videos are lacking. Most software implementations focus on still image stitching rather than video stitching. This thesis presents a fully automated cloud-based video production system that produces panoramic videos for sports events. The system consists of multiple software systems: the video recorder, the event recording manager, the video processing and the video archive. The focus of this thesis is on the video processing software's image processing pipeline. The pipeline is implemented as a graph, where each processing step is implemented as a node. The processing steps are distortion correction, cylindrical projection, color and exposure compensation, image blending and panorama composition. The implemented software is capable of stitching four 2160p video streams into a 7200×3584 resolution panorama stream in real time using an NVIDIA Tesla P4 graphics card. This makes high-quality broadcasts of sport events possible. Compared to traditional broadcasts, the panorama video

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Dual-Approach to Drone image Enhancement: Software-Based processing and FPGA Implementation

A Dual-Approach to Drone Image Enhancement: Software-Based P...

引用

2024 International conference on Smart Electronics and Communication Systems, ISENSE 2024

作者： Nair, Arun T Varma, Nanditha N Nandakumar, R. National Institute of Electronics and Information Technology Calicut India

ISBN: (纸本)9798331528126

This work offers a thorough method for real-time dehazing of drone-captured images by different filtering techniques with post-processing improvements. Enhancing visibility and picture clarity in hazy situations is the main goal since it is essential for applications like environmental monitoring, navigation, and surveillance. In order to estimate the atmospheric light and transmission map, the suggested methodology makes use of the dark channel. A farrow filter and a guided filter are then applied to improve the transmission. Metrics including the PSNR, SSIM, Execution time, and MSE that are derived from Python-based image processing implementations are used to assess the effectiveness of the dehazing algorithms. The outcomes show notable gains in processing efficiency and image clarity. In order to accelerate the performance, this method was implemented on an FPGA based SoC. Power consumption and throughput of the FPGA - based solution were evaluated, demonstrating its effectiveness and appropriateness for real-time applications. © 2024 IEEE.

关键词： Drones

来源：评论

学校读者我要写书评

暂无评论

real-time video super-resolution reconstruction using Wavelet Transforms and Sparse Representation

Real-time video super-resolution reconstruction using Wavele...

引用

real-time processing of image, Depth and video Information 2023

作者： Mora-Martinez, Yeredith G. Ponomaryov, Volodymyr I. Garcia-Salgado, Beatriz P. Reyes-Reyes, Rogelio Cruz-Ramos, Clara Instituto Politécnico Nacional ESIME Culhuacán Mexico City Mexico

ISBN: (数字)9781510662636

ISBN: (纸本)9781510662629

video super-resolution reconstruction consists of generating high-resolution frames by processing low-resolution ones. This process enhances the video quality, allowing the visualisation of fine details. Moreover, it can be considered a primary step in a video processing pipeline for further applications, such as object detection, classification and tracking from uncrewed aerial vehicles (UAV). For this reason, the super-resolution process should be performed quickly and accurately. Implementing a real-time video super-resolution method through parallel programming contributes to the efficiency of this pipeline. This work proposes two parallel super-resolution approaches for videos taken from UAVs: one using multi-core CPUs and another on a GPU architecture. The method is based on sparse representation and Wavelet transforms. First, it makes an edge correction performed in the Wavelet domain, then employs dictionaries previously trained with k-Singular Value Decomposition (k-SVD) to reconstruct the Wavelet subbands of the frames, and the high-resolution frames are computed from the Inverse Discrete Wavelet Transform (IDWT). The performance of this method was measured with the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and Edge Preservation Index (EPI). The implementations are tested in a workstation with a Ryzen multi-core processor and a CUDA-enabled GPU;furthermore, they are compared with the non-parallel method regarding algorithm complexity and computing time. © 2023 SPIE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

real-time image analysis in IoT-based home security system

Real-time image analysis in IoT-based home security system

引用

conference on real-time image processing and Deep Learning

作者： Villarreal, A. Mehrubeoglu, M. Davila, L. McLauchlan, L. Texas A&M Univ Corpus Christi Dept Engn Corpus Christi TX 78412 USA Texas A&M Univ Kingsville Dept Elect Engn & Comp Sci Kingsville TX USA

ISBN: (数字)9781510661714

ISBN: (纸本)9781510661707;9781510661714

Internet of Things (IoT) uses cloud-enabled data sharing to connect physical objects to sensors, processing software, and other technologies via the Internet. IoT allows a vast network of communication amongst these physical objects and their corresponding data. This study investigates the use of an IoT development board for real-time sensor data communication and processing, specifically images from a camera. The IoT development board and camera are programmed to capture images for object detection and analysis. Data processing is performed on board which includes the microcontroller and wireless communication with the sensor. The IoT connectivity and simulated test results to verify real-time signal communication and processing will be presented.

关键词： Internet of things IoT Arduino BeagleBone home security

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：