At present, most unmanned aerial vehicles (UAV) smoke detection systems transmit video back to the ground station computer for analysis to determine whether a fire has occurred, Since the image transmission process ta...
详细信息
Few-shot semantic segmentation is a technique with significant potential for medical image segmentation tasks. Most existing few-shot semantic segmentation methods require fully annotated labels for the training proce...
详细信息
ISBN:
(数字)9798350367331
ISBN:
(纸本)9798350367331;9798350367348
Few-shot semantic segmentation is a technique with significant potential for medical image segmentation tasks. Most existing few-shot semantic segmentation methods require fully annotated labels for the training process. However, these methods may not be suitable for medical images, where data collection and labeling are challenging. To address this issue, this paper proposed an enhanced, few-shot semantic segmentation model with a new pre-processing step to generate pseudo-labels automatically. In this paper, parallel computing is also developed to accelerate image pre-processing. Experiments done on MRI image datasets present the effectiveness of the new approach since it outperforms conventional few-shot semantic segmentation methods.
With the continuous development of Internet of Things technology, laboratory equipment management is gradually changing to the direction of intelligence and remote. In this paper, aiming at the data detection of labor...
详细信息
ISBN:
(纸本)9798400716607
With the continuous development of Internet of Things technology, laboratory equipment management is gradually changing to the direction of intelligence and remote. In this paper, aiming at the data detection of laboratory equipment, a solution of laboratory equipment image data patrol device based on Internet of Things technology is proposed. Through the acquisition, processing and transmission of equipment image data, the real-time monitoring and evaluation of equipment operation status and performance are realized. The research in this paper has certain reference value for improving the management efficiency and operation performance of laboratory equipment.
Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in realtime, the whole algorithm deployment chain must be thought of for efficiency fir...
详细信息
Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in realtime, the whole algorithm deployment chain must be thought of for efficiency first. The development is usually divided into two parts: first, designing an algorithm that meets precision constraints, then, implementing and optimizing its execution on the targeted platform. We argue that unifying those operations enhances performance on the embedded processor. This paper is based on an industrial use case of computer vision. The objective is to determine dense optical flow in realtime on an embedded GPU platform: the Nvidia AGX Xavier. The CLG (combined local-global) optical flow method, initially chosen, is analyzed to understand the convergence speed of its underlying optimization problem. The Jacobi solver is selected for implementation because of its parallel nature. The whole multi-level processing is then ported to the GPU, using several specific optimization strategies. In particular, we analyze the impact of fusing the solver's iterations with the roofline model. As a result, with a 30 W power budget, our implementation runs at 60FPS, on 640 x 512 images, with a four-level processing. Hopefully, this example should provide feedback on the issues that arise when trying to port a method to a parallel platform and serve for further implementations of computer vision algorithms on specialized hardware.
A person retrieval system (PRS) in video surveillance identifies an individual based on descriptive attributes, a task that employs several computationally intensive deep learning models. We implement and analyse a PR...
详细信息
A person retrieval system (PRS) in video surveillance identifies an individual based on descriptive attributes, a task that employs several computationally intensive deep learning models. We implement and analyse a PRS for pre-recorded videos on a graphics processing unit (GPU) and Nvidia Jetson Orin AGX. This paper presents a new Person Attribute Recognition (PAR) architecture, CorPAR, using three backbone networks, ConvNext, ResNet-50, and EfficientNet-B0. It enhances the F1-score by 4.1% with ConvNeXT-Base, 1.63% with the ResNet, and by 8.07% with EfficientNet-B0, surpassing the performance of the state-of-the-art Weighted-PAR method. The proposed method uses model compression techniques like quantisation and pruning with L1 regularisation to assess their impact on person retrieval. The study reveals that the PRS utilising EfficientNet-B0, with 32-bit quantisation, achieves the best performance, delivering a throughput of 22 frames per second and a True Positive Rate of 71% on Nvidia Jetson Orin AGX matching the performance of a model implemented using GPU.
Synthetic Aperture Radar (SAR) enables the generation of realistic and high-resolution 2D or 3D representations of landscapes. Typically, radar instruments are deployed in specially equipped, low-flying aircraft that ...
详细信息
ISBN:
(纸本)9798350374520;9798350374513
Synthetic Aperture Radar (SAR) enables the generation of realistic and high-resolution 2D or 3D representations of landscapes. Typically, radar instruments are deployed in specially equipped, low-flying aircraft that capture a significant amount of raw data, necessitating image reconstruction processing. However, the aircraft's limited onboard processing capabilities (power, size, weight, cooling, and communication bandwidth to ground stations) and the need to generate multiple SAR products, such as slant-range and geo-coded images during a single flight, require efficient onboard processing and transmission to the ground station. This paper outlines the processing architecture of the digital beamforming SAR (DBFSAR) employed by the German Aerospace Center (DLR) and the specific measures implemented to enable onboard processing. We elucidate the essential software optimizations and their integration into the SAR onboard routines, facilitating (near) real-time capability under certain conditions. Furthermore, we share the insights gained from our work and discuss their applicability to other processing scenarios with limited resource availability.
video prediction aims to generate future frames from the past several given frames. It has many applications for abnormal action recognition, future traffic prediction, long-term planning and autonomous driving. Recen...
详细信息
video prediction aims to generate future frames from the past several given frames. It has many applications for abnormal action recognition, future traffic prediction, long-term planning and autonomous driving. Recently, various deep learning-based methods have been proposed to address this task. However, these methods seem only to focus on increasing the network performance and ignore the computational cost problem of them. Even, several methods require two separate networks to perform with two different input types such as RGB, temporal gradient and optical flow. This makes them more and more complex and requires a extremely huge computational cost and memory space. In this paper, we introduce a simple yet robust approach to learn simultaneous both appearance and motion features in only a network regardless diversity of input video modalities. Moreover, we also present a lightweight autoencoder network for addressing this issue. Our framework is conducted on various benchmarks such as KTH, KITTI and BAIR datasets. The experimental results have shown that our approach achieves competitive performance compared to state-of-the-art video prediction methods with only 34.24MB of memory space and 2.59GFLOPs. With a smaller model size and less computational cost, our framework can run faster with a small inference time compared to the other methods. Besides, it only with 2.934 s to predict the next frame, our framework is a promising approach to deploy on embedded or mobile devices without GPU in realtime.
A polarization image frame capture device based on Camera Link interface is proposed and implemented. The polarization image frame capture device adopts large-capacity buffer device, multi-bus switching and DSP techno...
详细信息
Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplis...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, the computational demands of diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, and therefore challenging the deployment in real applications. To address these issues, this paper breaks down the text-based video editing task into two stages. First, we leverage an pre-trained text-to-image diffusion model to simultaneously edit few keyframes in an zero-shot way. Second, we introduce an efficient model called MaskINT, which is built on non-autoregressive masked generative transformers and specializes in frame interpolation between the edited keyframes, using the structural guidance from intermediate frames. Experimental results suggest that our MaskINT achieves comparable performance with diffusion-based methodologies, while significantly improve the inference time. This research offers a practical solution for text-based video editing and showcases the potential of non-autoregressive masked generative transformers in this domain.
Lane detection is critical in autonomous driving and advanced driver assistance systems (ADAS), furnishing vital information for vehicle navigation and safety. The study introduces lane detection methodology leveragin...
详细信息
暂无评论