Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large netwo...
Exposure errors in images, including both underexposure and overexposure, significantly diminish images’ contrast and visual appeal. Existing deep learning-based exposure correction methods either require large networks or longer processing time for inference and are thus not applicable for embedded devices and real-time applications. To address these issues, a lightweight network is proposed in this paper to correct exposure errors with limited memory occupation and inference steps. It adopts the Laplacian pyramid to incrementally recover the color and details of the image through a layer-by-layer procedure. A structural re-parameterization structure is designed to both reduce model size for inference speed up and improve performance with a multi-branch learning structure. Extensive experiments demonstrate that our method achieves a better performance-efficiency trade-off than other exposure correction methods.
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-base...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
This paper introduces an advanced intra prediction method designed for the Enhanced Compression Model (ECM), which is the reference software for beyond versatile video coding (VVC) standard. It employs a learning-based method to adaptively assign weights for a weighted average across neighboring samples, resulting in more precise prediction samples. The proposed method derives optimized weights for each intra prediction mode, for each block size, and for each sample position. To achieve a reasonable balance between encoding time and prediction accuracy, the conventional intra prediction mode is shared with the proposed method. Experimental evaluations have demonstrated that the proposed method provides bitrate reduction of up to 0.4%.
No-Reference image Quality Assessment (NR-IQA) aims to estimate the perceptual image quality without access to reference images. To deal with it effectively and efficiently, in this work we propose a Context and Salie...
No-Reference image Quality Assessment (NR-IQA) aims to estimate the perceptual image quality without access to reference images. To deal with it effectively and efficiently, in this work we propose a Context and Saliency aware Transformer Network (CSTNet), which is built based on a lightweight pyramid Vision Transformer (ViT). Specifically, a Multi-scale Context Aware Refinement (MCAR) block is devised to fully leverage hierarchical context features extracted by the ViT backbone. Further, saliency map prediction is incorporated as a sub-task to simulate the human attention on salient regions when perceiving images. Extensive experiments on public image quality datasets demonstrate its efficiency and superiority compared to the state-of-the-art models.
Recently, transformer-based and convolution-based methods have achieved significant results in learned image compression. By comparing the design of convolutional network (convnet) and transformers, we replace the sel...
Recently, transformer-based and convolution-based methods have achieved significant results in learned image compression. By comparing the design of convolutional network (convnet) and transformers, we replace the self-attention with convolution to capture spatial and channel adaptability. We propose a simple attention module (SAM) with transformer style. Combining the proposed SAM with channel-wise and checkerboard entropy model, we propose an efficient end-to-end learned image compression method. It is a simple method but obtains strong result and efficient coding speed. Experiments demonstrate that our method achieves competitive results by comparing with previous learning-based methods and conventional image codecs.
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. ...
详细信息
ISBN:
(纸本)9781728185514
In this paper, we propose a novel algorithm for summarization-based image resizing. In the past, a process of detecting precise locations of repeating patterns is required before the pattern removal step in resizing. However, it is difficult to find repeating patterns which are illuminated under different lighting conditions and viewed from different perspectives. To solve the problem, we first identify the regularity unit of repeating patterns by statistics. Then we can use the regularity unit for shift-map optimization to obtain a better resized image. The experimental results show that our method is competitive with other well-known methods.
The usual procedure used in Content Based image retrieval (CBIR), is to extract some useful low-level features such as color, texture and shape from the query image and retrieve images that have a similar set of featu...
详细信息
ISBN:
(纸本)9783031162107;9783031162091
The usual procedure used in Content Based image retrieval (CBIR), is to extract some useful low-level features such as color, texture and shape from the query image and retrieve images that have a similar set of features. However, the problem with using low-level features is the semantic gap between image feature representation and human visual understanding. That is why many researchers are devoted for improving content-based image retrieval methods with a particular focus on reducing the semantic gap between low-level features and human visual perceptions. Those researchers are mainly focused on combining low level features together to have a better representation of the content of an image, which make it closer to the human visual perception but still not close enough to reduce the semantic gap. In this paper we'll start by a comprehensive review on the recent researches in the field of image Retrieval, then we propose a CBIR system based on convolutional neural network and transfer learning to extract high-level features, as an initiative part of a larger project that aims to retrieve and collect images containing the Arabic language for natural language processing tasks.
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Blind image Quality Assessment (BIQA) is essential in computational vision for predicting the visual quality of digital images without reference counterparts. Despite advancements through convolutional neural networks (CNNs), a significant challenge in BIQA remains the long-tail distribution of image quality scores, leading to biased training and reduced model generalization. To address this, we restructured the KonIQ-10k dataset to create an imbalanced version named KonIQ-10k-LT, manipulating the distribution of image quality scores to have opposing distributions in the training and validation sets. This restructuring increases the proportion of certain quality scores in the training set while decreasing them in the validation set. Experimental results show a significant performance decline of BIQA models on the KonIQ-10k-LT dataset compared to the original KonIQ-10k, highlighting the challenge posed by the long-tail distribution. To mitigate this issue, we propose a Proportion Weighted Balancing (PWB) method as a baseline, designed to enhance the robustness and generalization ability of BIQA models. Our findings demonstrate that the proposed WB method improves the performance and reliability of BIQA models under these challenging conditions.
Virtual reality (VR) conference, as a typical social VR application, has gained popularity in recent years. It offers users located at different locations a fully immersive experience and a sense of togetherness. Howe...
Virtual reality (VR) conference, as a typical social VR application, has gained popularity in recent years. It offers users located at different locations a fully immersive experience and a sense of togetherness. However, the remote communication also introduces inevitable latencies, which may adversely affect the so-called social presence. There is still a lack of research on the effect of latency on social presence. To fill the gap, this paper aims to examine the impact of latency on social presence of VR conference and contrast it with that of traditional video conference. Here, the social presence is measured using the Networked Minds Social Presence Inventory (NMSPI). We design and conduct two conversation-based subjective tests for both types of conference and compare the impact of the latency based on the test results. The conclusions of these studies can be used as guidelines for VR service providers to optimize their conference systems.
Perceptual quality metrics derived from deep features have led to a boost in modelling the Human visual System (HVS) to perceive the quality of visual content. In this work, we study the effectiveness of fine-tuning t...
详细信息
ISBN:
(数字)9798350350456
ISBN:
(纸本)9798350350463
Perceptual quality metrics derived from deep features have led to a boost in modelling the Human visual System (HVS) to perceive the quality of visual content. In this work, we study the effectiveness of fine-tuning three standard convolutional neural networks (CNNs) viz. ResNet50, VGG16 and MobileNetV2 to predict the quality of stereoscopic images in the no-reference setting. This work also aims to understand the impact of using disparity maps for quality prediction. Interestingly, our experiments demonstrate that disparity maps do not significantly contribute to improving perceptual quality estimation in the deep learning framework. To the best of our knowledge, this is the first study that explores the impact of disparity along with the chosen models for Stereoscopic image Quality Assessment. We present a detailed study of our experiments with various architectural configurations on the LIVE Phase I and II datasets. Further, our results demonstrate the innate capability of deep features for quality prediction. Finally, the simple fine-tuning of the models results in solutions that compete with state-of-the-art patch-based stereoscopic image quality assessment methods.
Specific regions of interest (ROIs) within a video are of greater interest than the remainder of the video for many use cases. The Packed Regions Information (PRI) Supplemental Enhancement Information (SEI) message en...
详细信息
ISBN:
(数字)9798331529543
ISBN:
(纸本)9798331529550
Specific regions of interest (ROIs) within a video are of greater interest than the remainder of the video for many use cases. The Packed Regions Information (PRI) Supplemental Enhancement Information (SEI) message enables packing of rectangular ROIs from an original picture into a smaller resolution picture for video coding, reducing pixel rate and bitrate. The SEI message signals metadata describing the size and position of the ROIs in the coded picture and in the original picture. Decoders may use the metadata to reconstruct target pictures at the original resolution from the decoded pictures containing the packed regions. The PRI SEI message is under consideration for potential inclusion in a future version of the Versatile Supplemental Enhanced Information (VSEI) standard. Experimental results are provided for use of the PRI SEI message with test conditions for machine analysis of coded video content, showing reductions in bitrate and pixel rate.
暂无评论