Over the past few years, the development of deep learning-based methods has revolutionised the field of image quality assessment. These methods have shown remarkable success in estimating the quality of 2D images. How...
详细信息
ISBN:
(纸本)9781728198354
Over the past few years, the development of deep learning-based methods has revolutionised the field of image quality assessment. These methods have shown remarkable success in estimating the quality of 2D images. However, most of these approaches are trained using small patches of the image, with the subjective score of the entire image serving as the target. assuming that all patches have an equal perceptual impact on the image. This assumption is not entirely consistent with our Human visual System (HVS), which processes different regions of the image to form an overall perception. In this study, we focus on the use of saliency information to estimate image quality. In this context, we explore the use of saliency information to estimate image quality by selecting only the most perceptually relevant patches. Specifically, we evaluate the accuracy of saliency- or scanpath-based patch selection methods for predicting 2D image quality. Our goal is to determine which approach provides the most accurate estimation of image quality and whether the use of saliency information can improve the performance of deep learning-based methods for image quality assessment.
Recent approaches to image captioning typically follow an encoder-decoder architecture. The feature vectors extracted from the region proposals obtained from an object detector network serve as input to encoder. Witho...
详细信息
ISBN:
(纸本)9783031581809;9783031581816
Recent approaches to image captioning typically follow an encoder-decoder architecture. The feature vectors extracted from the region proposals obtained from an object detector network serve as input to encoder. Without any explicit spatial information about the visual regions, the caption synthesis model is limited to learn relationship from captions only. However, the structure between the semantic units in images and sentences is different. This work introduces a grid based spatial position encoding scheme to learn relationship from both domains. Furthermore, bi-linear pooling is used with attention for exploiting spatial and channel-wise attention distribution to capture second order interaction between multi-modal inputs. These are integrated within the Transformer architecture achieving a competitive CIDEr score.
The newest video coding standard, Versatile Video Coding (VVC), adopts a quad-Tree (QT) plus multi-Type tree (QTMT) block partition structure and improves the compression performance by about 30%∼50%, compared with t...
详细信息
image captioning models are a type of "Natural Language processing"(NLP) models that are designed to generate textual descriptions of images. These models are trained on large datasets of images and captions...
详细信息
Single image high dynamic range image reconstruction has been receiving much attention for recovering image details and showing the possibility of simulating brightness distribution in the real world. While most curre...
详细信息
ISBN:
(纸本)9798350367164;9798350367157
Single image high dynamic range image reconstruction has been receiving much attention for recovering image details and showing the possibility of simulating brightness distribution in the real world. While most current works focus on recovering overexposed areas, this work is more focused on underexposed regions and the brightness adjustment of the whole image. This paper proposes an additional plug-in module with histogram guided image binning method for low-light image high dynamic range restoration. This plug-in module is mainly designed with histogram feature extraction and image binning based brightness restoration, enhancing the recovery for the darker regions. Extensive experimentation demonstrates the effectiveness of the approach in enhancing the visual quality of low-light images and preserving details in underexposed areas. At an extremely low-light condition, networks using this plug-in module achieve up to a 0.8227 PSNR improvement and a 0.8278 PU21-PSNR improvement.
Overfitting is usually regarded as a negative condition since it impairs the generalisation power of a model. Nevertheless, overfitting a Neural Network (NN) on test data may be advantageous to improve the compression...
详细信息
This paper explores the potential of a learned two-layer B-frame codec, known as TLZMC. TLZMC is one of the few early attempts that deviate from the hybrid-based coding architecture by skipping motion coding. With TLZ...
详细信息
Chest X-ray imaging is of critical importance in order to effectively diagnose chest diseases, which are increasing today due to various environmental and hereditary factors. Although chest X-ray is the most commonly ...
详细信息
ISBN:
(纸本)9798350343557
Chest X-ray imaging is of critical importance in order to effectively diagnose chest diseases, which are increasing today due to various environmental and hereditary factors. Although chest X-ray is the most commonly used device for detecting pathological abnormalities, it can be quite challenging for specialists due to misleading locations and sizes of pathological abnormalities, visual similarities, and complex backgrounds. Traditional deep learning (DL) architectures fall short due to relatively small areas of pathological abnormalities and similarities between diseased and healthy areas. In addition, DL structures with standard classification approaches are not ideal for dealing with problems involving multiple diseases. In order to overcome the aforementioned problems, firstly, background-independent feature maps were created using a conventional convolutional neural network (CNN). Then, the relationships between objects in the feature maps are made suitable for multi-label classification tasks using the focal modulation network (FMA), an innovative attention module that is more effective than the self-attention approach. Experiments using a Chest x-ray dataset containing both single and multiple labels for a total of 14 different diseases show that the proposed approach can provide superior performance for multi-label datasets.
Text-To-image person search is challenging due to the cross-scale correspondences and information inequality between modalities. Specifically, images and text are complexly linked at different scales and images are us...
详细信息
This paper proposes a novel hybrid light field (LF) denoising method which is based on a convolutional neural network (CNN) designed to reflect the characteristic of LF image in both pixel and frequency domains. Notin...
详细信息
暂无评论