In this paper, a three-channel convolutional neural network (CNN) constrained by multiple loss functions is designed for stereoscopic image quality assessment (SIQA). Given that both monocular and binocular informatio...
详细信息
ISBN:
(纸本)9781728180687
In this paper, a three-channel convolutional neural network (CNN) constrained by multiple loss functions is designed for stereoscopic image quality assessment (SIQA). Given that both monocular and binocular information are crucial for SIQA, we take the patches of left images, right images and difference images as the inputs of the three channels respectively. Since using the ground truth as the labels of image patches cannot accurately characterize their quality, we propose to individually label each image patch to preserve the quality difference among different regions and views. Moreover, the multi-loss structure is adopted in the proposed method to consider both local features and global features simultaneously, which can constrain the feature learning from multiple perspectives. And the additional adaptive loss weights make the multi-loss network more flexible and universal. The experimental results show that the proposed method is superior to other existing SIQA methods with state-of-the-art performance.
Near infrared (NIR) images are robust to ambient light and contain clear textures in low light condition. In this paper, we propose NIR image colorization using spatial adaptive denormalization (SPADE) generator and g...
详细信息
ISBN:
(纸本)9781728180687
Near infrared (NIR) images are robust to ambient light and contain clear textures in low light condition. In this paper, we propose NIR image colorization using spatial adaptive denormalization (SPADE) generator and grayscale approximated self-reconstruction. Compared with traditional image to image translation methods, the proposed NIR colorization pursues photorealism rather than generative diversity. The challenge of this task is NIR-RGB mis-registration in training data. We address this problem by separately extracting NIR texture and RGB color with an end to end SPADE based model. Moreover, the proposed method facilitates a more precise synthesis with a given low light RGB reference image. Experiments on an open NIR-RGB dataset verify that the proposed method effectively preserves NIR textures and RGB colors in the synthesized results and outperforms the baselines in terms of visual quality and quantitative assessments.
The direction-adaptive discrete wavelet transform (DA-DWT) locally adapts the filtering direction to the geometric flow in the image. DA-DWT image coders have been shown to achieve a rate-distortion performance superi...
详细信息
ISBN:
(纸本)9780819466211
The direction-adaptive discrete wavelet transform (DA-DWT) locally adapts the filtering direction to the geometric flow in the image. DA-DWT image coders have been shown to achieve a rate-distortion performance superior to non-adaptive wavelet coders. However, since the direction information must always be signalled regardless of total bit-rate, performance at very low bit-rates might be worse. In this paper, we propose two scalable direction representations: the layered scheme which is similar to the scalable motion vector representation in scalable video coding and the level-unit scheme which provides finer granularity upon the layered scheme. Experimental results indicate that we can achieve the desirable performance at both low and high bit rates with our proposed level-unit scheme. Significant improvement in image quality (about 3-5 dB) is observed at very low bit rate, relative to non-scalable coding of the direction information.
This paper deals with the insertion of a chaotic signature in an image by exploiting the analogy presented by the wavelets transform and the human visual system (HVS) model, to modulate and adapt the signature accordi...
详细信息
ISBN:
(纸本)9781424412358
This paper deals with the insertion of a chaotic signature in an image by exploiting the analogy presented by the wavelets transform and the human visual system (HVS) model, to modulate and adapt the signature according to local characteristic's of the image. The blind detection process consists on computing the correlation between the marked DWT coefficients and the watermarking sequence. In order to face the problem of the geometrical de-synchronization, a differential technique of motion estimation is employed to compensate the possible geometrical deformations undergone by the watermarked image. The method has been proved to be robust to various imageprocessing such as compression, filtering and noise addition and various geometrical attacks such as translation, rotation, scaling, cropping and resizing.
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide ...
详细信息
ISBN:
(纸本)9781665475921
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide features based on encoder-decoder architecture. However, the ability to gather relevant contextual information about human is limited and different sub-tasks in HOI detection are not differentiated by specific decoupling in previous methods. To this end, we propose a new transformer-based method for HOI detection, namely, Mask-Guided Transformer (MGT). Our model, which is composed of five parallel decoders with a shared encoder, not only emphasizes interactive regions by applying body features, but also disentangles the prediction of instance and interaction. We achieve a favorable result at 63.3 mAP on the well-known HOI detection dataset V-COCO.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficienc...
详细信息
ISBN:
(纸本)9781728185514
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficiency of learning-based image and video codecs. At the encoding stage, one or more neural networks that are part of the codec are finetuned using the input image or video to achieve a better coding performance. The encoder encodes the input content into a content bitstream. If the finetuned neural network is part (also) of the decoder, the encoder signals the weight update of the finetuned model to the decoder along with the content bitstream. At the decoding stage, the decoder first updates its neural network model according to the received weight update, and then proceeds with decoding the content bitstream. Since a neural network contains a large number of parameters, compressing the weight update is critical to reducing bitrate overhead. In this paper, we propose learning-based methods to find the important parameters to be overfitted, in terms of rate-distortion performance. Based on simple distribution models for variables in the weight update, we derive two objective functions. By optimizing the proposed objective functions, the importance scores of the parameters can be calculated and the important parameters can be determined. Our experiments on lossless image compression codec show that the proposed method significantly outperforms a prior-art method where overfitted parameters were selected based on heuristics. Furthermore, our technique improved the compression performance of the state-of-the-art lossless image compression codec by 0.1 bit per pixel.
In this paper, we propose an end-to-end image compression framework, which cooperates with the swin-transformer modules to capture the localized and non-localized similarities in image compression. In particular, the ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose an end-to-end image compression framework, which cooperates with the swin-transformer modules to capture the localized and non-localized similarities in image compression. In particular, the swin-transformer modules are deployed in the analysis and synthesis stages, interleaving with convolution layers. The transformer layers are expected to perceive more flexible receptive fields, such that the spatially localized and non-localized redundancies could be more effectively eliminated. The proposed method reveals the excellent capability of signal conjunction and prediction, leading to the improvement of the rate and distortion performance. Experimental results show that the proposed method is superior to the existing methods on both natural scene and screen content images, where 22.46% BD-Rate savings are achieved when compared with the BPG. Over 30% BD-Rate gains could be observed with screen content images when compared with the classical hyper-prior end-to-end coding method.
Hyperspectral imaging captures a high number of spectrally narrow bands and provides advantages for image analysis applications such as identification and classification in particular. However, for the visual inspecti...
详细信息
ISBN:
(纸本)9781479948741
Hyperspectral imaging captures a high number of spectrally narrow bands and provides advantages for image analysis applications such as identification and classification in particular. However, for the visual inspection of hyperspectral images, the data is conventionally converted to a standard color image format. It is important that as much detail and data as possible is retained during this conversion. A novel hyperspectral visualization approach based on high dynamic range imaging is presented in this paper. The proposed approach retains visual detail and provides a superior result in terms of visual quality.
暂无评论