In the last few years, the abundance of available plank-ton images has significantly increased due to advancements in acquisition system technology. Consequently, a growing interest in automatic plankton image classif...
详细信息
This paper investigates the optimization and deployment of YOLOv7 deep learning model on NVIDIA Jetson Nano, an AI-focused edge computing platform for object detection in various computer visionapplications. The work...
详细信息
Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and...
详细信息
Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer(+), that improves our seminal work EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream classification (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.
The significance of high-speed machinevision in scientific and technological fields is growing, especially with the era of Industry 4.0 technologies. There are several pattern-matching algorithms that have various in...
详细信息
The significance of high-speed machinevision in scientific and technological fields is growing, especially with the era of Industry 4.0 technologies. There are several pattern-matching algorithms that have various intriguing applications in ultralow-latency machinevisionprocessing. However, the low frame rate of image sensors-which usually operate at tens of hertz-fundamentally limits the processing rate. The paper will conceptualize and develop the computerized pattern recognition technique that can be applied to investigate light beam profiles and extract the desired information according to the purpose required in this case study. In the current work, the automatic detection and inspection of laser spots were designed to perform analysis and alignment for laser beam in comparison with the electron spot beam using the LabVIEW graphical programming environment, especially when the laser and electron beams overlap. This is one of the important steps for realizing the fundamental aim of test-FEL to produce short wavelengths with the second, third, and fifth harmonics at 131.5, 88, and 53 nm, respectively. The tentative version of the program achieved the elementary purpose, which fulfilled the accurate transversal alignment of the ultrashort laser pulses with the electron beam in the system of the FEL test facility at MAX-Lab, in addition to studying the beam's stability and jittering range. Copyright (C) 2024 The Authors.
The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of token...
详细信息
ISBN:
(纸本)9783031434143;9783031434150
The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the ViT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-ViT (ReViT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReViT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative ViT models on image classification and action recognition.
Remote sensing scene categorization (RSSC) is a long-standing, vital, and complex issue in computer vision. It seeks to classify a scene into one of the predetermined scene groups by analysing the entire image. The ri...
详细信息
Remote sensing scene categorization (RSSC) is a long-standing, vital, and complex issue in computer vision. It seeks to classify a scene into one of the predetermined scene groups by analysing the entire image. The rise of large-scale datasets and the resurgence of deep learning-based methods, which directly learn potent feature representations from large amounts of raw data, have led to a lot of progress in representing and classifying RS scenes. Convolutional neural networks (CNN) are among the varieties of deep neural networks that have been the subject of the most research. Taking advantage of the swift increase in the amount of labelled samples and the major enhancements in the strength of processing units, CNNs research has advanced swiftly, producing state-of-the-art results on a number of applications. In this overview, we present a comprehensive evaluation of earlier published surveys and recent CNN-based approaches for RSSC. This study covers more than 100 significant works on scene categorization, including problems, benchmark datasets, and qualitative performance evaluation. In view of the results so far, this study concludes with a list of intriguing research opportunities.
image stabilization plays a crucial role in providing accurate and reliable visual information for machinevisionapplications. In maritime applications, such as unmanned ship navigation, where six degrees of freedom ...
详细信息
ISBN:
(纸本)9798350388350;9798350388343
image stabilization plays a crucial role in providing accurate and reliable visual information for machinevisionapplications. In maritime applications, such as unmanned ship navigation, where six degrees of freedom (DOF) motion and harsh maritime conditions prevail, the efficacy of image stabilization technology is vital for robust imageprocessing algorithms. This paper offers a comprehensive review of image stabilization techniques tailored for maritime environments, developed over the past two decades. We analyzed a total of 39 research articles on the subject, sourced from Web-of-Science, SCOPUS, and the Engineering Index databases, discussing potential research directions to address the limitations of current image stabilization methods, with special consideration for the unique requirements of ship-borne cameras. It provides an up-to-date overview of the techniques, limitations, and algorithms of ship-borne cameras for maritime applications, identifying current knowledge gaps and areas requiring further research. This review aims to guide the development of new technologies and methods to improve the performance of image stabilization systems in maritime contexts.
This examination intends to enhance the overall performance of welding operations through picture processing. It's going to use an aggregate of PC vision and gadgets, getting to know to perceive better and tune we...
详细信息
Robot vision servo control systems play an important role in modern automation systems, and image feature extraction and tracking, as its key components, have a direct impact on its performance and application scope. ...
详细信息
image coding for multi-task applications, catering to both human perception and machinevision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading t...
暂无评论