The process of multi-modal image registration is fundamental in remote sensing and visual navigation applications. However, existing image registration methods that are designed for single modality images do not provi...
详细信息
ISBN:
(纸本)9798350343557
The process of multi-modal image registration is fundamental in remote sensing and visual navigation applications. However, existing image registration methods that are designed for single modality images do not provide satisfactory results when applied to multi-modal image registration. In this research, our objective is to achieve highly accurate alignment of both infrared and optical (visible range) images. To accomplish this goal, we explore the effectiveness of the Swin Transformer encoder and cosine loss in enhancing the keypoint-based image registration process. Simulation results show the improvement achieved in multi-modal registration by using a transformer based Siamese network.
image captioning is defined as the process of describing of images by computer systems automatically. Thus, visual information regarding the content of the images is expressed in textual form. This paper presents a de...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
image captioning is defined as the process of describing of images by computer systems automatically. Thus, visual information regarding the content of the images is expressed in textual form. This paper presents a deep learning-based Turkish image captioning study implemented by using vision transformers and text decoders. In the proposed study, images are initially encoded with a vision transformer-based module. Afterwards, the features of the encoded image are normalized by passing them through a feature projection module. In the final stage, image captions are generated via a text decoder block. To test the performance of the Turkish image captioning system presented in this paper, TasvirEt, a benchmark dataset consisting of Turkish image captions, was used. In the tests performed, quite successful results were observed and a BLEU-1 value of 0.3406, a BLEU-2 value of 0.2110, a BLEU-3 value of 0.1253, a BLEU-4 value of 0.0690, a METEOR value of 0.1610, a ROUGE-L value of 0.3145 and a CIDEr value of 0.3879 were measured.
作者:
Chen, ZhaoguoCollege of Arts
Shandong Agricultural Engineering University Shandong Province Jinan250103 China
To fully harness the capabilities of computer graphics and imageprocessing technologies and elevate the quality of visual communication design, this paper presents a comprehensive suite of innovative methodologies. F...
详细信息
In this study, we propose a technique to improve the accuracy and reduce the size of convolutional neural networks (CNNs) running on edge devices for real-world robot vision applications. CNNs running on edge devices ...
详细信息
ISBN:
(纸本)9798350379068;9798350379051
In this study, we propose a technique to improve the accuracy and reduce the size of convolutional neural networks (CNNs) running on edge devices for real-world robot vision applications. CNNs running on edge devices must have a small architecture, and CNNs for robot vision applications involving on-site object recognition must be able to be trained efficiently to identify specific visual targets from data obtained under a limited variation of conditions. The visual nervous system (VNS) is a good example that meets the above requirements because it learns from few visual experiences. Therefore, we used a Gabor filter, a model of the feature extractor of the VNS, as a preprocessor for CNNs to investigate the accuracy of the CNNs trained with small amounts of data. To evaluate how well CNNs trained on image data acquired under a limited variation of conditions generalize to data acquired under other conditions, we created an image dataset consisting of images acquired from different camera positions, and investigated the accuracy of the CNNs that trained using images acquired at a certain distance. The results were compared after training on multiple CNN architectures with and without Gabor filters as preprocessing. The results showed that preprocessing with Gabor filters improves the generalization performance of CNNs and contributes to reducing the size of CNNs.
Recent approaches to image captioning typically follow an encoder-decoder architecture. The feature vectors extracted from the region proposals obtained from an object detector network serve as input to encoder. Witho...
详细信息
ISBN:
(纸本)9783031581809;9783031581816
Recent approaches to image captioning typically follow an encoder-decoder architecture. The feature vectors extracted from the region proposals obtained from an object detector network serve as input to encoder. Without any explicit spatial information about the visual regions, the caption synthesis model is limited to learn relationship from captions only. However, the structure between the semantic units in images and sentences is different. This work introduces a grid based spatial position encoding scheme to learn relationship from both domains. Furthermore, bi-linear pooling is used with attention for exploiting spatial and channel-wise attention distribution to capture second order interaction between multi-modal inputs. These are integrated within the Transformer architecture achieving a competitive CIDEr score.
Single image high dynamic range image reconstruction has been receiving much attention for recovering image details and showing the possibility of simulating brightness distribution in the real world. While most curre...
详细信息
ISBN:
(纸本)9798350367164;9798350367157
Single image high dynamic range image reconstruction has been receiving much attention for recovering image details and showing the possibility of simulating brightness distribution in the real world. While most current works focus on recovering overexposed areas, this work is more focused on underexposed regions and the brightness adjustment of the whole image. This paper proposes an additional plug-in module with histogram guided image binning method for low-light image high dynamic range restoration. This plug-in module is mainly designed with histogram feature extraction and image binning based brightness restoration, enhancing the recovery for the darker regions. Extensive experimentation demonstrates the effectiveness of the approach in enhancing the visual quality of low-light images and preserving details in underexposed areas. At an extremely low-light condition, networks using this plug-in module achieve up to a 0.8227 PSNR improvement and a 0.8278 PU21-PSNR improvement.
Printed circuit board (PCB) assemblies in everyday electronic devices are mass-produced. As a result of this production volume, a fast way of visual inspection is necessary. An integral part of visual inspection syste...
详细信息
ISBN:
(纸本)9798350343557
Printed circuit board (PCB) assemblies in everyday electronic devices are mass-produced. As a result of this production volume, a fast way of visual inspection is necessary. An integral part of visual inspection systems is PCB component classification. In this paper, we have explored use of the Vision Transformer (ViT), which is a recent state-of-the-art image classification approach, for PCB component classification. We have employed several ViT models that are available in the literature and also proposed a new compact, efficient, and high performing ViT model, named as ViT-Mini. We have conducted extensive experiments on the FICS-PCB dataset in order to comparatively evaluate the ViT models' performance. The highest achieved accuracy is 99.46% for capacitor and resistor classification and 96.52% for classification of capacitor, resistor, inductor, transistor, diode, and IC. The proposed compact model's performance is comparable with the ones obtained with larger models, which indicates its suitability for real-time applications.
images captured in low-light conditions often suffer from various issues such as noise, insufficient brightness, loss of sharpness, lack of detail, and color distortion. Although many efforts have been made to overcom...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
images captured in low-light conditions often suffer from various issues such as noise, insufficient brightness, loss of sharpness, lack of detail, and color distortion. Although many efforts have been made to overcome these problems, the desired results have not yet been achieved. In this study, we investigated the effectiveness of enhancing low-light images by feeding the Edge-Connect architecture, which has shown successful results in the in-painting problem, with additional edge information, followed by using residual attention mechanisms in the intermediate layers. The experiments show that the best performance is achieved by using a Convolutional Block Attention Module in the residual layers. Compared to state-of-the-art methods, the proposed method improves the PSNR, SSIM, and LPIPS metrics, and the visual results are closer to the ground truth.
image style transfer is a technique in computer vision by which the artistic style of one image is applied to the content of another while keeping the structural features. image style transfer finds applications in cr...
详细信息
This article introduces a novel multi-modal image fusion approach based on Convolutional Block Attention Module and dense networks to enhance human perceptual quality and information content in the fused images. The p...
详细信息
ISBN:
(纸本)9783031585340;9783031585357
This article introduces a novel multi-modal image fusion approach based on Convolutional Block Attention Module and dense networks to enhance human perceptual quality and information content in the fused images. The proposed model preserves the edges of the infrared images and enhances the contrast of the visible image as a pre-processing part. Consequently, the use of Convolutional Block Attention Module has resulted in the extraction of more refined features from the source images. The visual results demonstrate that the fused images produced by the proposed method are visually superior to those generated by most standard fusion techniques. To substantiate the findings, quantitative analysis is conducted using various metrics. The proposed method exhibits the best Naturalness image Quality Evaluator and Chen-Varshney metric values, which are human perception-based parameters. Moreover, the fused images exhibit the highest Standard Deviation value, signifying enhanced contrast. These results justify the proposed multi-modal image fusion technique outperforms standard methods both qualitatively and quantitatively, resulting in superior fused images with improved human perception quality.
暂无评论