In this paper, we propose an innovative method for refining segmentation method that improves the visual quality of Video-based Point Cloud Compression (V-PCC) encoder. Recently standardized as an international standa...
In this paper, we propose an innovative method for refining segmentation method that improves the visual quality of Video-based Point Cloud Compression (V-PCC) encoder. Recently standardized as an international standard by MPEG, V-PCC standard provides state-of-the-art performance in compressing dynamic and dense point cloud object. However, lossy V-PCC encoder has an unavoidable problem of visual quality degradation due to lost points. When converting a 3D point cloud to 2D patches in the V-PCC encoder, some points constituting a point cloud are not converted. In particular, in the refining segmentation of the 2D patch generation process, points that are changed the projection plane due to over-smoothing can be discarded. We propose a distance weighted refining segmentation method that reduces the number of missed points to improve visual quality. Experimental results show a noticeable improvement in visual quality with minor coding gain.
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
image dehazing plays a crucial role in autonomous driving and outdoor surveillance. However, as haze affects different components of an image in various ways and degrees, existing methods treat the image as a singular input and overlook the need to decouple different components, leading to mutual interference during the enhancement of each component. Consequently, issues such as insufficient color restoration or blurred edges may arise. In this paper, we introduce a novel tri-branch network for Single image Dehazing that independently extracts low-frequency, high-frequency, and semantic information from images using three distinct sub-networks. A meticulously designed fusion network is then employed to integrate the information from these three branches to produce the final dehazed image. To facilitate the training of such a complex network, we propose a two-stage training approach. Experimental results demonstrate that our approach achieves state-of-the-art (SOTA) performance.
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we pre...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Most approaches in learned image compression follow the transform coding scheme. The characteristics of latent variables transformed from images significantly influence the performance of codecs. In this paper, we present visual analyses on latent features of learned image compression and find that the latent variables are spread over a wide range, which may lead to complex entropy coding processes. To address this, we introduce a Deviation Control (DC) method, which applies a constraint loss on latent features and entropy parameter μ. Training with DC loss, we obtain latent features with smaller values of coding symbols and σ, effectively reducing entropy coding complexity. Our experimental results show that the plug-and-play DC loss reduces entropy coding time by 30-40% and improves compression performance.
image captioning neural networks are trained simultaneously on image recognition sub-models and natural language processing sub-models to generate description sentences for images. This paper presents several image ca...
详细信息
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large netwo...
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large networks or longer processing time for inference and are thus not applicable for embedded devices and real-time applications. To address these issues, a lightweight network is proposed in this paper to correct exposure errors with limited memory occupation and inference steps. It adopts the Laplacian pyramid to incrementally recover the color and details of the image through a layer-by-layer procedure. A structural re-parameterization structure is designed to both reduce model size for inference speed up and improve performance with a multi-branch learning structure. Extensive experiments demonstrate that our method achieves a better performance-efficiency trade-off than other exposure correction methods.
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-base...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-based method to adaptively assign weights for a weighted average across neighboring samples, resulting in more precise prediction samples. The proposed method derives optimized weights for each intra prediction mode, for each block size, and for each sample position. To achieve a reasonable balance between encoding time and prediction accuracy, the conventional intra prediction mode is shared with the proposed method. Experimental evaluations have demonstrated that the proposed method provides bitrate reduction of up to 0.4%.
No-Reference image Quality Assessment (NR-IQA) aims to estimate the perceptual image quality without access to reference images. To deal with it effectively and efficiently, in this work we propose a Context and Salie...
No-Reference image Quality Assessment (NR-IQA) aims to estimate the perceptual image quality without access to reference images. To deal with it effectively and efficiently, in this work we propose a Context and Saliency aware Transformer Network (CSTNet), which is built based on a lightweight pyramid Vision Transformer (ViT). Specifically, a Multi-scale Context Aware Refinement (MCAR) block is devised to fully leverage hierarchical context features extracted by the ViT backbone. Further, saliency map prediction is incorporated as a sub-task to simulate the human attention on salient regions when perceiving images. Extensive experiments on public image quality datasets demonstrate its efficiency and superiority compared to the state-of-the-art models.
Recently, transformer-based and convolution-based methods have achieved significant results in learned image compression. By comparing the design of convolutional network (convnet) and transformers, we replace the sel...
Recently, transformer-based and convolution-based methods have achieved significant results in learned image compression. By comparing the design of convolutional network (convnet) and transformers, we replace the self-attention with convolution to capture spatial and channel adaptability. We propose a simple attention module (SAM) with transformer style. Combining the proposed SAM with channel-wise and checkerboard entropy model, we propose an efficient end-to-end learned image compression method. It is a simple method but obtains strong result and efficient coding speed. Experiments demonstrate that our method achieves competitive results by comparing with previous learning-based methods and conventional image codecs.
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. ...
详细信息
ISBN:
(纸本)9781728185514
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. However, it is difficult to find repeating patterns which are illuminated under different lighting conditions and viewed from different perspectives. To solve the problem, we first identify the regularity unit of repeating patterns by statistics. Then we can use the regularity unit for shift-map optimization to obtain a better resized image. The experimental results show that our method is competitive with other well-known methods.
The usual procedure used in Content Based image retrieval (CBIR), is to extract some useful low-level features such as color, texture and shape from the query image and retrieve images that have a similar set of featu...
详细信息
ISBN:
(纸本)9783031162107;9783031162091
The usual procedure used in Content Based image retrieval (CBIR), is to extract some useful low-level features such as color, texture and shape from the query image and retrieve images that have a similar set of features. However, the problem with using low-level features is the semantic gap between image feature representation and human visual understanding. That is why many researchers are devoted for improving content-based image retrieval methods with a particular focus on reducing the semantic gap between low-level features and human visual perceptions. Those researchers are mainly focused on combining low level features together to have a better representation of the content of an image, which make it closer to the human visual perception but still not close enough to reduce the semantic gap. In this paper we'll start by a comprehensive review on the recent researches in the field of image Retrieval, then we propose a CBIR system based on convolutional neural network and transfer learning to extract high-level features, as an initiative part of a larger project that aims to retrieve and collect images containing the Arabic language for natural language processing tasks.
暂无评论