Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we pre...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we present visual analyses on latent features of learned image compression and find that the latent variables are spread over a wide range, which may lead to complex entropy coding processes. To address this, we introduce a Deviation Control (DC) method, which applies a constraint loss on latent features and entropy parameter μ. Training with DC loss, we obtain latent features with smaller values of coding symbols and σ, effectively reducing entropy coding complexity. Our experimental results show that the plug-and-play DC loss reduces entropy coding time by 30-40% and improves compression performance.
image captioning neural networks are trained simultaneously on image recognition sub-models and natural language processing sub-models to generate description sentences for images. This paper presents several image ca...
详细信息
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large netwo...
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large networks or longer processing time for inference and are thus not applicable for embedded devices and real-time applications. To address these issues, a lightweight network is proposed in this paper to correct exposure errors with limited memory occupation and inference steps. It adopts the Laplacian pyramid to incrementally recover the color and details of the image through a layer-by-layer procedure. A structural re-parameterization structure is designed to both reduce model size for inference speed up and improve performance with a multi-branch learning structure. Extensive experiments demonstrate that our method achieves a better performance-efficiency trade-off than other exposure correction methods.
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-base...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-based method to adaptively assign weights for a weighted average across neighboring samples, resulting in more precise prediction samples. The proposed method derives optimized weights for each intra prediction mode, for each block size, and for each sample position. To achieve a reasonable balance between encoding time and prediction accuracy, the conventional intra prediction mode is shared with the proposed method. Experimental evaluations have demonstrated that the proposed method provides bitrate reduction of up to 0.4%.
No-Reference image Quality Assessment (NR-IQA) aims to estimate the perceptual image quality without access to reference images. To deal with it effectively and efficiently, in this work we propose a Context and Salie...
No-Reference image Quality Assessment (NR-IQA) aims to estimate the perceptual image quality without access to reference images. To deal with it effectively and efficiently, in this work we propose a Context and Saliency aware Transformer Network (CSTNet), which is built based on a lightweight pyramid Vision Transformer (ViT). Specifically, a Multi-scale Context Aware Refinement (MCAR) block is devised to fully leverage hierarchical context features extracted by the ViT backbone. Further, saliency map prediction is incorporated as a sub-task to simulate the human attention on salient regions when perceiving images. Extensive experiments on public image quality datasets demonstrate its efficiency and superiority compared to the state-of-the-art models.
Recently, transformer-based and convolution-based methods have achieved significant results in learned image compression. By comparing the design of convolutional network (convnet) and transformers, we replace the sel...
Recently, transformer-based and convolution-based methods have achieved significant results in learned image compression. By comparing the design of convolutional network (convnet) and transformers, we replace the self-attention with convolution to capture spatial and channel adaptability. We propose a simple attention module (SAM) with transformer style. Combining the proposed SAM with channel-wise and checkerboard entropy model, we propose an efficient end-to-end learned image compression method. It is a simple method but obtains strong result and efficient coding speed. Experiments demonstrate that our method achieves competitive results by comparing with previous learning-based methods and conventional image codecs.
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. ...
详细信息
ISBN:
(纸本)9781728185514
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. However, it is difficult to find repeating patterns which are illuminated under different lighting conditions and viewed from different perspectives. To solve the problem, we first identify the regularity unit of repeating patterns by statistics. Then we can use the regularity unit for shift-map optimization to obtain a better resized image. The experimental results show that our method is competitive with other well-known methods.
The usual procedure used in Content Based image retrieval (CBIR), is to extract some useful low-level features such as color, texture and shape from the query image and retrieve images that have a similar set of featu...
详细信息
ISBN:
(纸本)9783031162107;9783031162091
The usual procedure used in Content Based image retrieval (CBIR), is to extract some useful low-level features such as color, texture and shape from the query image and retrieve images that have a similar set of features. However, the problem with using low-level features is the semantic gap between image feature representation and human visual understanding. That is why many researchers are devoted for improving content-based image retrieval methods with a particular focus on reducing the semantic gap between low-level features and human visual perceptions. Those researchers are mainly focused on combining low level features together to have a better representation of the content of an image, which make it closer to the human visual perception but still not close enough to reduce the semantic gap. In this paper we'll start by a comprehensive review on the recent researches in the field of image Retrieval, then we propose a CBIR system based on convolutional neural network and transfer learning to extract high-level features, as an initiative part of a larger project that aims to retrieve and collect images containing the Arabic language for natural language processing tasks.
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks (CNNs), a significant challenge in BIQA remains the long-tail distribution of image quality scores, leading to biased training and reduced model generalization. To address this, we restructured the KonIQ-10k dataset to create an imbalanced version named KonIQ-10k-LT, manipulating the distribution of image quality scores to have opposing distributions in the training and validation sets. This restructuring increases the proportion of certain quality scores in the training set while decreasing them in the validation set. Experimental results show a significant performance decline of BIQA models on the KonIQ-10k-LT dataset compared to the original KonIQ-10k, highlighting the challenge posed by the long-tail distribution. To mitigate this issue, we propose a Proportion Weighted Balancing (PWB) method as a baseline, designed to enhance the robustness and generalization ability of BIQA models. Our findings demonstrate that the proposed WB method improves the performance and reliability of BIQA models under these challenging conditions.
Virtual reality (VR) conference, as a typical social VR application, has gained popularity in recent years. It offers users located at different locations a fully immersive experience and a sense of togetherness. Howe...
Virtual reality (VR) conference, as a typical social VR application, has gained popularity in recent years. It offers users located at different locations a fully immersive experience and a sense of togetherness. However, the remote communication also introduces inevitable latencies, which may adversely affect the so-called social presence. There is still a lack of research on the effect of latency on social presence. To fill the gap, this paper aims to examine the impact of latency on social presence of VR conference and contrast it with that of traditional video conference. Here, the social presence is measured using the Networked Minds Social Presence Inventory (NMSPI). We design and conduct two conversation-based subjective tests for both types of conference and compare the impact of the latency based on the test results. The conclusions of these studies can be used as guidelines for VR service providers to optimize their conference systems.
暂无评论