video object detection (VOD) is a challenging task, and image object detectors are difficult to detect degradation phenomena in certain video frames. However, existing research on VOD mostly trades high computational ...
详细信息
video object detection (VOD) is a challenging task, and image object detectors are difficult to detect degradation phenomena in certain video frames. However, existing research on VOD mostly trades high computational costs for accuracy, making it difficult to achieve a balance between accuracy and speed. This work proposes an optimized real-time Detection Transformer (RT-DETR) model for VOD that introduces a decoupled Feature Aggregation Module (FAM) to separately refine the localization and classification detection heads. This method only requires a minimal increase in the number of parameters to achieve significant improvements in accuracy. Specifically, we insert FAM before the localization detection head and classification detection head, and first freeze all parameters of the feature extractor and classification detection head to train only the parameters of the localization detection head to obtain more accurate localization results. Then, we freeze all parameters of the feature extractor and localization detection head to train only the parameters of the classification detection head to improve the final detection accuracy. We have conducted a large number of ablation experiments to verify the effectiveness of the method. Without using any post-processing methods, we achieved 90.0% mAP on the imageNet-VID dataset, with only 77.9 M parameters and an average inference speed of 14.1ms.
This study proposes a real-time parking space detection using a pixel-based imageprocessing technique. The proposed algorithm detects free spaces in a busy parking place video stream. The performance of the algorithm...
详细信息
video object detection aims to detect and track each object in a given video. However, due to the problem of appearance deterioration in the video, it is still challenging to obtain good results when we apply traditio...
详细信息
ISBN:
(数字)9781665496209
ISBN:
(纸本)9781665496209
video object detection aims to detect and track each object in a given video. However, due to the problem of appearance deterioration in the video, it is still challenging to obtain good results when we apply traditional image object detection methods to videos. In this paper, we propose a new feature aggregation method, called Dual Feature Aggregation (DualFeat) for video object detection. By effectively combining the temporal and spatial attention mechanisms, we make full use of the temporal and spatial information in videos. Meanwhile, we leverage a real-time tracker to track detected objects in video frames, where features are aggregated again with previously obtained features. Such a way helps to obtain more comprehensive and richer features, greatly improving the accuracy of video object detection. We perform experiments on the ILSVRC2017 dataset, and the experimental results also verify the effectiveness of our method.
In the field of multi-object tracking, this study introduces an innovative framework designed to address the challenges posed by frame loss in image sequences, particularly within the contexts of video surveillance an...
详细信息
Object detection methods based on deep learning have made great progress in recent years and have been used successfully in many different applications. However, since they have been evaluated predominantly on dataset...
详细信息
Object detection methods based on deep learning have made great progress in recent years and have been used successfully in many different applications. However, since they have been evaluated predominantly on datasets of natural images, it is still unclear how accurate and effective they can be if used in special domain applications, for example in scientific, industrial, etc. images, where the properties of the images are very different from those taken in natural scenes. In this study, we illustrate the challenges one needs to face in such a setting on a concrete practical application, involving the detection of a particular fluid phenomenon-bag-breakup-in images of droplet scattering, which differ significantly from natural images. Using two technologically mature and state-of-the-art object detection methods, RetinaNet and YOLOv7, we discuss what strategies need to be considered in this problem setting, and perform both quantitative and qualitative evaluations to study their effects. Additionally, we also propose a new method to further improve accuracy of detection by utilizing information from several consecutive frames. We hope that the practical insights gained in this study can be of use to other researchers and practitioners when targeting applications where the images differ greatly from natural images.
Convolutional neural networks are a powerful category of artificial neural networks that can extract features from raw data to provide greatly reduced parametric complexity and enhance pattern recognition and the accu...
详细信息
Convolutional neural networks are a powerful category of artificial neural networks that can extract features from raw data to provide greatly reduced parametric complexity and enhance pattern recognition and the accuracy of prediction. Optical neural networks offer the promise of dramatically accelerating computing speed while maintaining low power consumption even when using high-speed data streams running at hundreds of gigabit/s. Here, we propose an optical convolutional processor (CP) that leverages the spectral response of an arrayed waveguide grating (AWG) to enhance convolution speed by eliminating the need for repetitive element-wise multiplication. Our design features a balanced AWG configuration, enabling both positive and negative weightings essential for convolutional kernels. A proof-of-concept demonstration of an 8-bit resolution processor is experimentally implemented using a pair of AWGs with a broadband Mach-Zehnder interferometer (MZI) designed to achieve uniform weighting across the whole spectrum. Experimental results demonstrate the CP's effectiveness in edge detection and achieved 96% accuracy in a convolutional neural network for MNIST recognition. This approach can be extended to other common operations, such as pooling and deconvolution in Generative Adversarial Networks. It is also scalable to more complex networks, making it suitable for applications like autonomous vehicles and real-timevideo recognition. A novel convolutional processor is proposed using the shifted spectral response of a pair of arrayed waveguide gratings (AWGs) to mimic the kernel shifts during image convolution. This inherent mixing of inputs in the AWG's spectral response eliminates the need for repetitive element-wise computations while enabling the simultaneous generation of convolved output maps. image
Typically, video production requires a lot of manual labor. Sport events can be several hours long and during that time the camera equipment requires an operator. The cost of labor and large amount of time spent on vi...
Typically, video production requires a lot of manual labor. Sport events can be several hours long and during that time the camera equipment requires an operator. The cost of labor and large amount of time spent on video production increases demand for automated video production solutions. However, automatic video production is not a simple task. Sport venues are typically quite large and thus capturing the whole venue with a single static camera is often not possible. A solution to increasing the field of view is to create a panoramic video using multiple synchronized cameras. The main problem with this approach is that panorama stitching is computationally expensive. Another challenge is creating natural looking panoramas. images captured using different cameras can have color or exposure differences and final panoramas might have visible seams or alignment errors. Although image stitching methods have been known for a long time, software solutions capable of creating high-quality panorama videos are lacking. Most software implementations focus on still image stitching rather than video stitching. This thesis presents a fully automated cloud-based video production system that produces panoramic videos for sports events. The system consists of multiple software systems: the video recorder, the event recording manager, the videoprocessing and the video archive. The focus of this thesis is on the videoprocessing software's imageprocessing pipeline. The pipeline is implemented as a graph, where each processing step is implemented as a node. The processing steps are distortion correction, cylindrical projection, color and exposure compensation, image blending and panorama composition. The implemented software is capable of stitching four 2160p video streams into a 7200×3584 resolution panorama stream in realtime using an NVIDIA Tesla P4 graphics card. This makes high-quality broadcasts of sport events possible. Compared to traditional broadcasts, the panorama video
This work offers a thorough method for real-time dehazing of drone-captured images by different filtering techniques with post-processing improvements. Enhancing visibility and picture clarity in hazy situations is th...
详细信息
video super-resolution reconstruction consists of generating high-resolution frames by processing low-resolution ones. This process enhances the video quality, allowing the visualisation of fine details. Moreover, it ...
详细信息
Internet of Things (IoT) uses cloud-enabled data sharing to connect physical objects to sensors, processing software, and other technologies via the Internet. IoT allows a vast network of communication amongst these p...
详细信息
ISBN:
(数字)9781510661714
ISBN:
(纸本)9781510661707;9781510661714
Internet of Things (IoT) uses cloud-enabled data sharing to connect physical objects to sensors, processing software, and other technologies via the Internet. IoT allows a vast network of communication amongst these physical objects and their corresponding data. This study investigates the use of an IoT development board for real-time sensor data communication and processing, specifically images from a camera. The IoT development board and camera are programmed to capture images for object detection and analysis. Data processing is performed on board which includes the microcontroller and wireless communication with the sensor. The IoT connectivity and simulated test results to verify real-time signal communication and processing will be presented.
暂无评论