The task of image captioning has seen considerable success using deep neural networks. This assessment offers a thorough overview of the most cutting-edge approaches for deep learning-based unsupervised image captioni...
详细信息
In this paper, we propose a new architecture for thermal image enhancement in which we exploit the strengths of both architecture-based vision transformers and generative adversarial networks. Our approach includes th...
详细信息
ISBN:
(纸本)9781728198354
In this paper, we propose a new architecture for thermal image enhancement in which we exploit the strengths of both architecture-based vision transformers and generative adversarial networks. Our approach includes the introduction of a thermal loss function, which is specifically employed to produce high quality images. In addition, we consider fine-tuning based on visible images for thermal image restoration, resulting in an overall improvement in image quality. The performance of our proposed architecture is evaluated using visual quality metrics. The results show significant improvements over the original thermal images and over other established enhancement methods on a subset of the KAIST dataset. The performance of the proposed enhancement architecture is also verified on the detection results by obtaining better performance with a considerable margin considering different versions of the YOLO detector.
Few-shot image classification aims to learn a model that can adopt to unseen classes with few labeled data. This challenging problem requires to overcome the distribution shift of features due to differences between t...
详细信息
This report presents the outcomes of the Summer Challenge on Writer Verification, hosted as part of the Eighth National conference on computervision, Pattern Recognition, imageprocessing, and Graphics (NCVPRIPG) hel...
详细信息
ISBN:
(纸本)9789819752119;9789819752126
This report presents the outcomes of the Summer Challenge on Writer Verification, hosted as part of the Eighth National conference on computervision, Pattern Recognition, imageprocessing, and Graphics (NCVPRIPG) held at IIT Jodhpur on July 21-23, 2023. This challenge introduces a novel dataset comprising images of handwritten text contributed by 1,352 unique writers. Predominantly, these images feature handwritten Hindi text, but they also encompass a variety of elements such as numbers, mathematical symbols, and English text. Participants were tasked with developing a model capable of automatically determining whether the text in a given pair of images is authored by the same writer or different writers. The primary objective of this challenge was to advance research in the realm of handwritten text recognition. Throughout the competition, we registered 108 teams, with 18 teams submitting results for the validation dataset. Out of these, 13 teams provided submissions for the semi-final dataset. The top six teams from the semi-finals were subsequently invited to compete in the finals. Additional details about the challenge can be found at https://***/challenges/wv2023/.
We present Contrastive Feature Masking vision Transformer (CFM-ViT) - an image-text pretraining methodology that achieves simultaneous learning of image- and region-level representation for open-vocabulary object dete...
ISBN:
(纸本)9798350307184
We present Contrastive Feature Masking vision Transformer (CFM-ViT) - an image-text pretraining methodology that achieves simultaneous learning of image- and region-level representation for open-vocabulary object detection (OVD). Our approach combines the masked autoencoder (MAE) objective into the contrastive learning objective to improve the representation for localization tasks. Unlike standard MAE, we perform reconstruction in the joint image-text embedding space, rather than the pixel space as is customary with the classical MAE method, which causes the model to better learn region-level semantics. Moreover, we introduce Positional Embedding Dropout (PED) to address scale variation between image-text pretraining and detection finetuning by randomly dropping out the positional embeddings during pretraining. PED improves detection performance and enables the use of a frozen ViT backbone as a region classifier, preventing the forgetting of open-vocabulary knowledge during detection finetuning. On LVIS open-vocabulary detection benchmark, CFM-ViT achieves a state-of-the-art 33.9 APr, surpassing the best approach by 7.6 points and achieves better zero-shot detection transfer. Finally, CFM-ViT acquires strong image-level representation, outperforming the state of the art on 8 out of 12 metrics on zero-shot image-text retrieval benchmarks.
Object detection in challenging weather conditions is a formidable hurdle in the realm of computervision. Unfavorable weather circumstances, encompassing rain, snow, fog, and low light conditions, substantially imped...
详细信息
Reversible Data Hiding in Encrypted images (RDHEI) embeds information while protecting the content of images from being leaked, allowing users to decrypt image content, extract embedded information, and losslessly rec...
详细信息
Artificial Intelligence is a fast-growing domain that facilitates the innovation in various fields of business and manufacturing industries. This field of Machine learning provides the automatic inspection of the manu...
详细信息
With the rapid development of computervision and imageprocessing, geometric shape recognition has become a highly regarded research area. This study aims to explore a method that combines the FAST feature point reco...
详细信息
Detecting and managing various types of defects that occur in the manufacturing process is important for product quality control. Detecting flaws in product presentation is an ongoing research topic in computervision...
详细信息
暂无评论