Learned image compression (LIC) methods have made significant advances in recent years. In LIC, entropy model is an essential component, which utilizes conditional information to predict the probability distribution o...
详细信息
In this paper, we propose a new architecture for thermal image enhancement in which we exploit the strengths of both architecture-based vision transformers and generative adversarial networks. Our approach includes th...
详细信息
ISBN:
(纸本)9781728198354
In this paper, we propose a new architecture for thermal image enhancement in which we exploit the strengths of both architecture-based vision transformers and generative adversarial networks. Our approach includes the introduction of a thermal loss function, which is specifically employed to produce high quality images. In addition, we consider fine-tuning based on visible images for thermal image restoration, resulting in an overall improvement in image quality. The performance of our proposed architecture is evaluated using visual quality metrics. The results show significant improvements over the original thermal images and over other established enhancement methods on a subset of the KAIST dataset. The performance of the proposed enhancement architecture is also verified on the detection results by obtaining better performance with a considerable margin considering different versions of the YOLO detector.
Omnidirectional image quality assessment (OIQA) aims to predict the perceptual quality of omnidirectional images that cover the whole 180x360 degrees viewing range of the visual environment. Here we propose a blind/no...
详细信息
ISBN:
(纸本)9781728198354
Omnidirectional image quality assessment (OIQA) aims to predict the perceptual quality of omnidirectional images that cover the whole 180x360 degrees viewing range of the visual environment. Here we propose a blind/no-reference OIQA method named Local Statistics and Global Semantics metric (LSGS) that bridges the gap between low-level statistics and high-level semantics of omnidirectional images. Specifically, statistic and semantic features are extracted in separate paths from multiple local viewports and the hallucinated global omnidirectional image, respectively. A quality regression along with a weighting process is then followed that maps the extracted quality-aware features to a perceptual quality prediction. Experimental results demonstrate that the proposed LSGS method offers highly competitive performance against state-of-the-art methods.
Most AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various tasks. However, the classic task setup rarely considers the challenging, yet common practical...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Most AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input. While insufficient visual input naturally leads to performance drop, we propose to supplement the missing visions via natural language dialog interactions to better accomplish the task objective. We design a model-agnostic Supplementary Interactive Dialog (SI-Dial) framework that can be jointly learned with most existing models, endowing the current AI systems with the ability of question-answer interactions in natural language. We demonstrate the feasibility of such task setting with missing visual input and the effectiveness of our proposed dialog module as the supplementary information source through extensive experiments, by achieving promising performance improvement over multiple baselines.
Underwater single image super-resolution (UISR) is a challenging task as these images frequently suffer from poor visibility. The best-published UISR works continue to suffer from color degradation, poor texture repre...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Underwater single image super-resolution (UISR) is a challenging task as these images frequently suffer from poor visibility. The best-published UISR works continue to suffer from color degradation, poor texture representation, and loss of finer (high-frequency) details. We propose a novel deep learning-based (DL) UISR model that incorporates spatial information as well as the transformed (wavelet) coefficient of degraded low-resolution (LR) underwater images by intelligent feature management. To ensure the visual quality of the super-resolved image, color channel-specific L1 loss, perceptual loss, and difference of Gaussian (DoG) loss are used in tandem with SSIM loss. We employ publicly available datasets, namely UFO-120 and USR-248, to evaluate the proposed model. The results of our experiments show that our model outperforms existing state-of-the-art methods (e.g., similar to 9.45%/similar to 1.77% in SSIM and similar to 0.91%/similar to 1.44% in PSNR on UFO-120/USR-248 x4, respectively), as demonstrated through quantitative measurements and visual quality assessments.
Detecting small objects in drone-captured images or aerial videos is challenging due to their minimal representation. As data traverses deep learning networks, the information about small objects can diminish, making ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Detecting small objects in drone-captured images or aerial videos is challenging due to their minimal representation. As data traverses deep learning networks, the information about small objects can diminish, making high-resolution images essential for enhanced detection performance. However, high-resolution images increase computational load undesirably. Leveraging this fact, we propose a streamlined neural network designed specifically for small object detection in high-resolution images. The proposed network encompasses three main components: i) Enhanced High-Resolution processing Module (EHRPM), ii) the Small Object Feature Amplified Feature Pyramid Network (SOFA-FPN) with its Edge Enhancement Module (EEM), Cross Lateral Connection Module (CLCM), and Dual Bottom-up Convolution Module (DBCM), and iii) the Sigmoid Re-weighting Module (SRM). Compared to several state-of-the-art networks, our method delivers superior performance with fewer parameters and a lower computational demand. The source code is available at https://***/datu0615/EHRPM.
Joint source-channel coding schemes based on deep neural networks (DeepJSCC) have recently achieved remarkable performance for wireless image transmission. However, these methods usually focus only on the distortion o...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Joint source-channel coding schemes based on deep neural networks (DeepJSCC) have recently achieved remarkable performance for wireless image transmission. However, these methods usually focus only on the distortion of the reconstructed signal at the receiver side with respect to the source at the transmitter side, rather than the perceptual quality of the reconstruction which carries more semantic information. As a result, severe perceptual distortion can be introduced under extreme conditions such as low bandwidth and low signal-to-noise ratio. In this work, we propose CommIN, which views the recovery of high-quality source images from degraded reconstructions as an inverse problem. To address this, CommIN combines Invertible Neural Networks (INN) with diffusion models, aiming for superior perceptual quality. Through experiments, we show that our CommIN significantly improves the perceptual quality compared to DeepJSCC under extreme conditions and outperforms other inverse problem approaches used in DeepJSCC.
With the development of neural networks, the coding efficiency of learned image compression methods gradually exceeds that of traditional image codecs that are carefully designed and optimized by experts. However, the...
详细信息
image restoration is a challenging and ill-posed problem which also has been a long-standing issue. In this paper, we proposed a multi-branch restoration model inspired from the Human visual System (i.e., Retinal Gang...
详细信息
ISBN:
(纸本)9781728198354
image restoration is a challenging and ill-posed problem which also has been a long-standing issue. In this paper, we proposed a multi-branch restoration model inspired from the Human visual System (i.e., Retinal Ganglion Cells) for image deraindrop. The experiments show that the proposed multi-branch architecture, called CMFNet, has state-of-the-art performance results. The source code and pretrained models are available at https: //***/FanChiMao/CMFNet. And the interactive demonstration of the proposed deraindrop model can be accessed at https://***/dXaeNg.
Infrared and visible image fusion aims to integrate salient targets and abundant texture information into a single fused image. Existing methods typically ignore the issue of illumination, so that there are problems o...
详细信息
ISBN:
(纸本)9781728198354
Infrared and visible image fusion aims to integrate salient targets and abundant texture information into a single fused image. Existing methods typically ignore the issue of illumination, so that there are problems of weak texture details and poor visual perception in case of low illumination. To address this issue, we propose a low-light oriented infrared and visible image fusion network, named L2Fusion. In particular, we first design a decomposition network according to Retinex theory to obtain the reflectance features of a visible image with low-light. Then, these features are integrated with the features extracted from the corresponding infrared image by a residual network. The finally fused image largely eliminates the negative impact caused by low illumination, and contains both salient targets and abundant texture information. Extensive experiments demonstrate the superiority of our L2Fusion over the state-of-the-art methods, in terms of both visual effect and quantitative metrics.
暂无评论