Deep networks have achieved great success in image rescaling (IR) task that seeks to learn the optimal downscaled representations, i.e., low-resolution (LR) images, to reconstruct the original high-resolution (HR) ima...
ISBN:
(纸本)9798350307184
Deep networks have achieved great success in image rescaling (IR) task that seeks to learn the optimal downscaled representations, i.e., low-resolution (LR) images, to reconstruct the original high-resolution (HR) images. Compared with super-resolution methods that consider a fixed downscaling scheme, e.g., bicubic, IR often achieves significantly better reconstruction performance thanks to the learned downscaled representations. This highlights the importance of a good downscaled representation. Existing IR methods mainly learn the downscaled representation by jointly optimizing the downscaling and upscaling models. Unlike them, we seek to improve the downscaled representation through a different and more direct way - directly optimizing the downscaled image itself instead of the down-/upscaling models. Consequently, we propose a Hierarchical Collaborative Downscaling (HCD) method that performs gradient descent w.r.t. the reconstruction loss in both HR and LR domains to improve the downscaled representations, so as to boost IR performance. Extensive experiments show that our HCD significantly improves the reconstruction performance both quantitatively and qualitatively. Particularly, we improve over popular IR methods by >0.57 dB PSNR on Set5. Moreover, we also highlight the flexibility of our HCD since it can generalize well across diverse image rescaling models. The code is available at https://***/xubingna/HCD.
Few-shot image classification is to categorize novel classes with limited training instances. A key hurdle in few-shot image classification arises from the disjoint nature of training and testing categories, which res...
详细信息
This paper aims to study the intelligent construction technology based on computervision, which applies imageprocessing and analysis technology in the construction process to improve construction efficiency and qual...
详细信息
image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent description...
ISBN:
(纸本)9798350307184
image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-theart approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in crossentropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://***/aimagelab/PMA-Net.
Corn plays an important role in many fields, but the level of intelligent detection for moldy corn is low. This article proposes a method for identifying moldy corn kernels based on machine vision. First, the image is...
详细信息
In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. We aim to integrate the advan...
ISBN:
(纸本)9798350307184
In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. We aim to integrate the advantages of the physical model and the generative network. Furthermore, we hope to supplement and even deduce the information missing in the low-light image through the generative network. Therefore, Diff-Retinex formulates the lowlight image enhancement problem into Retinex decomposition and conditional image generation. In the Retinex decomposition, we integrate the superiority of attention in Transformer and meticulously design a Retinex Transformer decomposition network (TDN) to decompose the image into illumination and reflectance maps. Then, we design multi-path generative diffusion networks to reconstruct the normal-light Retinex probability distribution and solve the various degradations in these components respectively, including dark illumination, noise, color deviation, loss of scene contents, etc. Owing to generative diffusion model, Diff-Retinex puts the restoration of low-light subtle detail into practice. Extensive experiments conducted on real-world low-light datasets qualitatively and quantitatively demonstrate the effectiveness, superiority, and generalization of the proposed method.
Digital radiography (DR) is becoming popular for the point of care imaging in the recent past. To reduce the radiation exposure, controlled radiation based on as low as reasonably achievable (ALARA) principle is emplo...
详细信息
ISBN:
(纸本)9783031581731;9783031581748
Digital radiography (DR) is becoming popular for the point of care imaging in the recent past. To reduce the radiation exposure, controlled radiation based on as low as reasonably achievable (ALARA) principle is employed and this results in low contrast images. To address this issue, post-processing algorithms such as the Multiscale image Contrast Amplification (MUSICA) algorithm can be used to enhance the contrast of DR images even with a low radiation dose. In this study, a modification of the MUSICA algorithm is investigated to determine the potential for further contrast improvement specifically for DR images. The conclusion is that combining log compression and its inverse at the appropriate stage with a multi-stage MUSICA and denoising is very promising. The proposed method resulted in an average of 66.5% increase in the mean contrast-to-noise ratio (CNR) for the test images considered.
Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning p...
ISBN:
(纸本)9798350307184
Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process. To solve this issue, we propose a Masked Diffusion Transformer (MDT) that introduces a mask latent modeling scheme to explicitly enhance the DPMs' ability to contextual relation learning among object semantic parts in an image. During training, MDT operates in the latent space to mask certain tokens. Then, an asymmetric masking diffusion transformer is designed to predict masked tokens from unmasked ones while maintaining the diffusion generation process. Our MDT can reconstruct the full information of an image from its incomplete contextual input, thus enabling it to learn the associated relations among image tokens. Experimental results show that MDT achieves superior image synthesis performance, e.g., a new SOTA FID score in the imageNet data set, and has about 3x faster learning speed than the previous SOTA DiT.
This paper comprehensively investigates the efficiency and performance of the keypoints detection and description methods in computervision and imageprocessing. Four widely used methods - SIFT, SURF, ORB, and BRISK ...
详细信息
Low-light image enhancement aims at improving human perception or the effectiveness of computervision tasks of images taken in dark. The low-light images are usually seriously lack in visual information. To tackle th...
详细信息
ISBN:
(纸本)9781728198354
Low-light image enhancement aims at improving human perception or the effectiveness of computervision tasks of images taken in dark. The low-light images are usually seriously lack in visual information. To tackle this problem, we propose a general Low-light image Enhancement Transformer Network (LLIEFormer) with a degraded restoration model in this paper. The network of LLIEFormer synthesizes the advantages of Transformer to extract global information and convolutional neural networks to capture local details. We conduct extensive experiments on various low-illumination enhanced datasets including PairL1.6K and FiveK to demonstrate the effectiveness of our method. The results show that our LLIEFormer has better performance and wider applicability than other advanced methods. Our code will be available at https://***/xunpengyi/LLIEFormer.
暂无评论