The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This pape...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The increasing demand for high-quality, real-time visual communication and the growing user expectations, coupled with limited network resources, necessitate novel approaches to semantic image communication. This paper presents a method to enhance semantic image communication that combines a novel lossy semantic encoding approach with spatially adaptive semantic image synthesis models. By developing a model-agnostic training augmentation strategy, our approach substantially reduces susceptibility to distortion introduced during encoding, effectively eliminating the need for lossless semantic encoding. Comprehensive evaluation across two spatially adaptive conditioning methods and three popular datasets indicates that this approach enhances semantic image communication at very low bit rate regimes.
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This paper focuses on the Referring image Segmentation (RIS) task, which aims to segment objects from an image based on a given language description, having significant potential in practical applications such as food safety detection. Recent advances using the attention mechanism for cross-modal interaction have achieved excellent progress. However, current methods tend to lack explicit principles of interaction design as guidelines, leading to inadequate cross-modal comprehension. Additionally, most previous works use a single-modal mask decoder for prediction, losing the advantage of full cross-modal alignment. To address these challenges, we present a Fully Aligned Network (FAN) that follows four cross-modal interaction principles. Under the guidance of reasonable rules, our FAN achieves state-of-the-art performance on the prevalent RIS benchmarks (RefCOCO, RefCOCO+, G-Ref) with a simple architecture.
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we pre...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we present visual analyses on latent features of learned image compression and find that the latent variables are spread over a wide range, which may lead to complex entropy coding processes. To address this, we introduce a Deviation Control (DC) method, which applies a constraint loss on latent features and entropy parameter mu. Training with DC loss, we obtain latent features with smaller values of coding symbols and s, effectively reducing entropy coding complexity. Our experimental results show that the plug-and-play DC loss reduces entropy coding time by 30-40% and improves compression performance.
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular input and overlook the need to decouple different components, leading to mutual interference during the enhancement of each component. Consequently, issues such as insufficient color restoration or blurred edges may arise. In this paper, we introduce a novel tri-branch network for Single image Dehazing that independently extracts low-frequency, high-frequency, and semantic information from images using three distinct sub-networks. A meticulously designed fusion network is then employed to integrate the information from these three branches to produce the final dehazed image. To facilitate the training of such a complex network, we propose a two-stage training approach. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance.
Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, e...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, existing methods face two primary challenges: limited performance improvement and high model complexity. In this paper, to address these two challenges, we propose a perceptual image compression solution by introducing a conditional diffusion model. Given that compression performance heavily depends on the decoder's generative capability, we base our decoder on the diffusion transformer architecture. To address the model complexity problem, we implement the diffusion transformer architecture with Swin transformer. Equipped with enhanced generative capability, we further augment the decoder with informative features using a multi-scale feature fusion module. Experimental results demonstrate that our approach surpasses existing perceptual image compression methods while achieving lower model complexity.
Recent advancements in learned image compression methods have demonstrated superior rate-distortion performance and remarkable potential compared to traditional compression techniques. However, the core operation of q...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Recent advancements in learned image compression methods have demonstrated superior rate-distortion performance and remarkable potential compared to traditional compression techniques. However, the core operation of quantization, inherent to lossy image compression, introduces errors that can degrade the quality of the reconstructed image. To address this challenge, we propose a novel Quantization Error Compensator (QEC), which leverages spatial context within latent representations and hyperprior information to effectively mitigate the impact of quantization error. Moreover, we propose a tailored quantization error optimization training strategy to further improve rate-distortion performance. Notably, QEC serves as a lightweight, plug-and-play module, offering high flexibility and seamless integration into various learned image compression methods. Extensive experimental results consistently demonstrate significant coding efficiency improvements achievable by incorporating the proposed QEC into state-of-the-art methods, with a slight increase in runtime.
Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, co...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks (CNNs), a significant challenge in BIQA remains the long-tail distribution of image quality scores, leading to biased training and reduced model generalization. To address this, we restructured the KonIQ-10k dataset to create an imbalanced version named KonIQ-10k-LT, manipulating the distribution of image quality scores to have opposing distributions in the training and validation sets. This restructuring increases the proportion of certain quality scores in the training set while decreasing them in the validation set. Experimental results show a significant performance decline of BIQA models on the KonIQ-10k-LT dataset compared to the original KonIQ-10k, highlighting the challenge posed by the long-tail distribution. To mitigate this issue, we propose a Proportion Weighted Balancing (PWB) method as a baseline, designed to enhance the robustness and generalization ability of BIQA models. Our findings demonstrate that the proposed WB method improves the performance and reliability of BIQA models under these challenging conditions.
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern o...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Plenoptic cameras are light field capturing devices able to acquire large amounts of angular and spatial information. The lenslet video produced by such cameras presents on each frame a distinctive hexagonal pattern of micro-images. Due to the particular structure of lenslet images, traditional video codecs perform poorly on lenslet video. Previous works have proposed a preprocessing scheme that cuts and realigns the micro-images on each lenslet frame. While effective, this method introduces high frequency components into the processed image. In this paper, we propose an additional step to the aforementioned scheme by applying an invertible smoothing transform. We evaluate the enhanced scheme on lenslet video sequences captured with single-focused and multi-focused plenoptic cameras. On average, the enhanced scheme achieves 9.85% bitrate reduction compared to the existing scheme.
The rapid advancements in medical imaging have led to a growing demand for high-performance lossless compression of large 3D medical image datasets. Unlike natural images, medical images typically feature three-dimens...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The rapid advancements in medical imaging have led to a growing demand for high-performance lossless compression of large 3D medical image datasets. Unlike natural images, medical images typically feature three-dimensional structures, and high bit-depth, necessitating specialized compression techniques. Based on a decoder-only transformer, we propose a learnable dual-decoder model for lossless compression of 3D medical images. Our approach packs voxels into patches, which are processed by a patch-level decoder to extract the patch feature. The voxels, along with the patch feature, are subsequently fed into a voxel-level decoder to model each voxel. This coarse-to-fine modeling strategy reduces the computational time for each voxel and enables long-range modeling dependencies. Experimental results demonstrate that our proposed model achieves state-of-the-art compression performance, with an approximately 15% improvement in compression performance over the traditional JP3D benchmark on various datasets.
暂无评论