检索结果-内蒙古大学图书馆

22nd IEEE International Conference on Machine Learning and Applications, ICMLA 2023

作者： Khoshkhahtinat, Atefeh Zafari, Ali Mehta, Piyush M. Akyash, Mohammad Kashiani, Hossein Nasrabadi, Nasser M. West Virginia University Dept. of Computer Science & Electrical Engineering WV United States West Virginia University Dept. of Mechanical & Aerospace Engineering WV United States

ISBN: (纸本)9798350345346

Transform and entropy models are the two core components in deep image compression neural networks. Most existing learning-based image compression methods utilize convolutional-based transform, which lacks the ability to model long-range dependencies, primarily due to the limited receptive field of the convolution operation. To address this limitation, we propose a Transformer-based nonlinear transform. This transform has the remarkable ability to efficiently capture both local and global information from the input image, leading to a more decorrelated latent representation. In addition, we introduce a novel entropy model that incorporates two different hyperpriors to model cross-channel and spatial dependencies of the latent representation. To further improve the entropy model, we add a global context that leverages distant relationships to predict the current latent more accurately. This global context employs a causal attention mechanism to extract long-range information in a content-dependent manner. Our experiments show that our proposed framework performs better than the state-of-the-art methods in terms of rate-distortion performance. © 2023 IEEE.

关键词： Attention Entropy model Global context Hyper-prior neural image compression Transformer

来源：评论

学校读者我要写书评

暂无评论

Exploring the rate-distortion-complexity optimization in neural image compression

引用

JOURNAL OF VISUAL COMMUNICATION AND image REPRESENTATION 2024年 105卷

作者： Gao, Yixin Feng, Runsen Guo, Zongyu Chen, Zhibo Univ Sci & Technol China Hefei Peoples R China

Despite a short history, neural image codecs have been shown to surpass classical image codecs in terms of rate-distortion performance. However, most of them suffer from significantly longer decoding times, which hinders the practical applications of neural image codecs. This issue is especially pronounced when employing an effective yet time-consuming autoregressive context model since it would increase entropy decoding time by orders of magnitude. In this paper, unlike most previous works that pursue optimal RD performance while temporally overlooking the coding complexity, we make a systematical investigation on the rate-distortioncomplexity (RDC) optimization in neural image compression. By quantifying the decoding complexity as a factor in the optimization goal, we are now able to precisely control the RDC trade-off and then demonstrate how the rate-distortion performance of neural image codecs could adapt to various complexity demands. Going beyond the investigation of RDC optimization, a variable-complexity neural codec is designed to leverage the spatial dependencies adaptively according to industrial demands, which supports fine-grained complexity adjustment by balancing the RDC tradeoff. By implementing this scheme in a powerful base model, we demonstrate the feasibility and flexibility of RDC optimization for neural image codecs.

关键词： neural image compression Rate-distortion-complexity optimization Variable-complexity

来源：评论

学校读者我要写书评

暂无评论

A Taxonomy of Miscompressions: Preparing image Forensics for neural compression 16

A Taxonomy of Miscompressions: Preparing Image Forensics for...

引用

2024 International Workshop on Information Forensics and Security

作者： Hofer, Nora Bohme, Rainer Univ Innsbruck Innsbruck Austria

ISBN: (纸本)9798350364439;9798350364422

neural compression has the potential to revolutionize lossy image compression. Based on generative models, recent schemes achieve unprecedented compression rates at high perceptual quality, but they compromise semantic fidelity. Details of decompressed images may appear optically flawless, but semantically different from the originals, making compression errors difficult or impossible to detect. We explore the problem space and propose a provisional taxonomy of miscompressions. It defines three types of "what happens" and has a binary "high impact" flag indicating miscompressions that alter symbols. We discuss how the taxonomy can facilitate risk communication and research into mitigations.

关键词： neural image compression miscompression semantic changes forensics

来源：评论

学校读者我要写书评

暂无评论

Semantically-Guided image compression for Enhanced Perceptual Quality at Extremely Low Bitrates

引用

IEEE ACCESS 2024年 12卷 100057-100072页

作者： Iwai, Shoma Miyazaki, Tomo Omachi, Shinichiro Tohoku Univ Grad Sch Engn Sendai Miyagi 9808579 Japan

image compression methods based on machine learning have achieved high rate-distortion performance. However, the reconstructions they produce suffer from blurring at extremely low bitrates (below 0.1 bpp), resulting in low perceptual quality. Although some methods attempt to reconstruct sharp images using Generative Adversarial Networks (GANs), reconstructing natural textures at low bitrates remains challenging. In this paper, we propose a novel image compression method that explicitly utilizes semantic information. Specifically, we send a semantic label map to the decoder, which takes it as input. This semantic information enables the decoder to reconstruct appropriate textures consistent with the corresponding semantic classes. Although semantic label maps can be compressed into relatively small data sizes using common methods (e.g., PNG), the data size is not negligible in an extremely low-rate setting. To address this problem, we propose simple yet effective label map compression strategies, including an autoregressive label map compressor. Our strategies significantly reduce the data size of the label map while maintaining the critical semantic information that allows the decoder to reconstruct realistic and suitable textures. By utilizing this data-efficient semantic information, our method can reconstruct realistic images even at an extremely low bitrate. As a result, the proposed method outperformed existing models, including a GAN-based model designed for low-rate settings and a state-of-the-art semantically guided method, in both quantitative evaluation and user studies. Furthermore, we analyzed the effect of semantic information by switching the input label map, confirming that the model synthesized textures appropriate to the given semantic labels.

关键词： image coding Semantics image reconstruction Decoding Bit rate Codes Entropy Generative adversarial networks image compression semantic information perceptual image compression GANs neural image compression

来源：评论

学校读者我要写书评

暂无评论

Wavelet-like Transform with Subbands Fusion in Decoupled Structure for Deep image compression

Wavelet-like Transform with Subbands Fusion in Decoupled Str...

引用

Picture Coding Symposium (PCS)

作者： Ma, Ke Wu, Yaojun Zhang, Zhaobin Esenlik, Semih Sun, Xiaoyan Zhang, Kai Zhang, Li Univ Sci & Technol China Hefei 230026 Anhui Peoples R China Bytedance Inc San Diego CA 92122 USA

ISBN: (纸本)9798350358483;9798350358490

Wavelet-like transform, based on convolutional neural network (CNN), is content-adaptive and has made remarkable achievements in end-to-end image compression. However, the subsequent sequential processing of each subband in the entropy module takes a relatively long decoding time, resulting in inconvenience for real-world applications. In this work, for lossy image compression, the wavelet-like transform is transplanted into the prevailing autoencoder structure to enhance the analysis and synthesis transform due to its excellent decomposition capability. The obtained subbands of different frequencies will undergo a hierarchical decorrelation architecture for subband fusion, also called cross fusing module. The specialized treatment will be applied to different subbands according to their spatial resolution to attain a more compact latent representation. In addition, the proposed solution features an architecture that decouples the arithmetic decoding process from the sample prediction process, which significantly reduces the decoding complexity. Experiments on the Kodak test set show that the proposed method achieves -3.04% BD-Rate compared to existing decoupled end-to-end structure in RGB Peak Signal-to-Noise Ratio (PSNR).

关键词： neural image compression wavelet transform decoupled entropy coding hierarchical decorrelation

来源：评论

学校读者我要写书评

暂无评论

AICT: AN ADAPTIVE image compression TRANSFORMER 30

AICT: AN ADAPTIVE IMAGE COMPRESSION TRANSFORMER

引用

30th IEEE International Conference on image Processing (ICIP)

作者： Ghorbel, Ahmed Hamidouche, Wassim Morin, Luce Univ Rennes CNRS INSA Rennes IETR UMR 6164 F-35000 Rennes France Technol Innovat Inst POB 9639 Abu Dhabi U Arab Emirates

ISBN: (纸本)9781728198354

Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in long-range modeling dependencies due to their local connectivity and an increasing number of architectural biases and priors. On the contrary, the proposed ICT can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed adaptive image compression transformer (AICT) framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM.

关键词： neural image compression Adaptive Resolution Spatio-Channel Entropy Modeling Self-attention Transformer

来源：评论

学校读者我要写书评

暂无评论

Reducing The Amortization Gap of Entropy Bottleneck In End-to-End image compression

Reducing The Amortization Gap of Entropy Bottleneck In End-t...

引用

Picture Coding Symposium (PCS)

作者： Balcilar, Muhammet Damodaran, Bharath Hellier, Pierre InterDigital Inc Rennes France

ISBN: (纸本)9781665492577

End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep neural network, mapping input image into latent space, jointly with an entropy model of the latent distribution. The decoder is also learned as a deep trainable network, and the reconstructed image measures the distortion. These methods enforce the latent to follow some prior distributions. Since these priors are learned by optimization over the entire training set, the performance is optimal in average. However, it cannot fit exactly on every single new instance, hence damaging the compression performance by enlarging the bitstream. In this paper, we propose a simple yet efficient instancebased parameterization method to reduce this amortization gap at a minor cost. The proposed method is applicable to any end-to-end compressing methods, improving the compression bitrate by 1% without any impact on the reconstruction quality.

关键词： neural image compression Entropy Model Amortization Gap

来源：评论

学校读者我要写书评

暂无评论

Laplacian-guided Entropy Model in neural Codec with Blur-dissipated Synthesis

Laplacian-guided Entropy Model in Neural Codec with Blur-dis...

引用

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Khoshkhahtinat, Atefeh Zafari, Ali Mehta, Piyush M. Nasrabadi, Nasser M. West Virginia Univ Morgantown WV 26506 USA

ISBN: (纸本)9798350353013;9798350353006

While replacing Gaussian decoders with a conditional diffusion model enhances the perceptual quality of reconstructions in neural image compression, their lack of inductive bias for image data restricts their ability to achieve state-of-the-art perceptual levels. To address this limitation, we adopt a non-isotropic diffusion model at the decoder side. This model imposes an inductive bias aimed at distinguishing between frequency contents, thereby facilitating the generation of high-quality images. Moreover, our framework is equipped with a novel entropy model that accurately models the probability distribution of latent representation by exploiting spatio-channel correlations in latent space, while accelerating the entropy decoding step. This channel-wise entropy model leverages both local and global spatial contexts within each channel chunk. The global spatial context is built upon the Transformer, which is specifically designed for image compression tasks. The designed Transformer employs a Laplacian-shaped positional encoding, the learnable parameters of which are adaptively adjusted for each channel cluster. Our experiments demonstrate that our proposed framework yields better perceptual quality compared to cutting-edge generative-based codecs, and the proposed entropy model contributes to notable bitrate savings. The code is available at https://***/Atefeh-Khoshtinat/Blur-dissipated-compression.

关键词： Entropy Model neural image compression Non-isotropic Diffusion Model Transformer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：