For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficienc...
详细信息
ISBN:
(纸本)9781728185514
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficiency of learning-based image and video codecs. At the encoding stage, one or more neural networks that are part of the codec are finetuned using the input image or video to achieve a better coding performance. The encoder encodes the input content into a content bitstream. If the finetuned neural network is part (also) of the decoder, the encoder signals the weight update of the finetuned model to the decoder along with the content bitstream. At the decoding stage, the decoder first updates its neural network model according to the received weight update, and then proceeds with decoding the content bitstream. Since a neural network contains a large number of parameters, compressing the weight update is critical to reducing bitrate overhead. In this paper, we propose learning-based methods to find the important parameters to be overfitted, in terms of rate-distortion performance. Based on simple distribution models for variables in the weight update, we derive two objective functions. By optimizing the proposed objective functions, the importance scores of the parameters can be calculated and the important parameters can be determined. Our experiments on lossless image compression codec show that the proposed method significantly outperforms a prior-art method where overfitted parameters were selected based on heuristics. Furthermore, our technique improved the compression performance of the state-of-the-art lossless image compression codec by 0.1 bit per pixel.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
A 4-dimensional (4D) image can be viewed as a stack of volumetric images over channels of observation depth or temporal frames. This data contains rich information at the cost of high demands for storage and transmiss...
详细信息
ISBN:
(纸本)9781728180687
A 4-dimensional (4D) image can be viewed as a stack of volumetric images over channels of observation depth or temporal frames. This data contains rich information at the cost of high demands for storage and transmission resources due to its large volume. In this paper, we present a lossless 4D image compression algorithm by extending CCSDS-123.0-B-1 standard. Instead of separately compressing the volumetric image at each channel of 4D images, the proposed algorithm efficiently exploits redundancy across the fourth dimension of data. Experiments conducted on two types of 4D images demonstrate the effectiveness of the proposed lossless compression method.
A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports ...
详细信息
ISBN:
(纸本)9781728180687
A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports stealthy, thereby, palm off the spurious for the genuine. This paper proposes an architecture named News image Steganography (NIS) to reveal the aforementioned inconsistency through image steganography based on GAN. Extractive summarization about a news image is generated based on its source texts, and a learned steganographic algorithm encodes and decodes the summarization of the image in a manner that approaches perceptual invisibility. Once an encoded image is quoted, its source summarization can be decoded and further presented as the ground truth to verify the quoting news. The pairwise encoder and decoder endow images of the capability to carry along their imperceptible summarization. Our NIS reveals the underlying inconsistency, thereby, according to our experiments and investigations, contributes to the identification accuracy of fake news that engrafts untampered images.
In the visual inspection, the quality assurance is difficult, because the dispersion occurs in the result by skill and fatigue degree of the inspector. Recently, a visual inspection method by imageprocessing using de...
详细信息
ISBN:
(纸本)9781665435536
In the visual inspection, the quality assurance is difficult, because the dispersion occurs in the result by skill and fatigue degree of the inspector. Recently, a visual inspection method by imageprocessing using deep learning has been proposed. When using deep learning, the dataset to be used is important. In this paper, we describe a method for detecting painting defects using imageprocessing, automatically generating data for deep learning, and using these data for classification using deep learning.
A tensor display is a type of 3D light field display, composed of multiple transparent screens and a back-light that can render a scene with correct depth, allowing to view a 3D scene without wearing glasses. The anal...
详细信息
ISBN:
(纸本)9781665475921
A tensor display is a type of 3D light field display, composed of multiple transparent screens and a back-light that can render a scene with correct depth, allowing to view a 3D scene without wearing glasses. The analysis of state-of-the-art tensor displays assumes that the content is Lambertian. In order to extend its capabilities, we analyze the limitations of displaying non-Lambertian scenes and propose a new method to factorize the non-Lambertian scenes using disparity analysis. Moreover, we demonstrate a new prototype of a tensor display with three layers of full HD content at 60 fps. Compared with state-of-the-art, the evaluation results verify that the proposed non-Lambertian rendering method can display a higher quality for non-Lambertian scenes on both simulation and a prototyped tensor display.
With the rapid development of whole brain imaging technology, a large number of brain images have been produced, which puts forward a great demand for efficient brain image compression methods. At present, the most co...
详细信息
ISBN:
(纸本)9781728185514
With the rapid development of whole brain imaging technology, a large number of brain images have been produced, which puts forward a great demand for efficient brain image compression methods. At present, the most commonly used compression methods are all based on 3-D wavelet transform, such as JP3D. However, traditional 3-D wavelet transforms are designed manually with certain assumptions on the signal, but brain images are not as ideal as assumed. What's more, they are not directly optimized for compression task. In order to solve these problems, we propose a trainable 3-D wavelet transform based on the lifting scheme, in which the predict and update steps are replaced by 3-D convolutional neural networks. Then the proposed transform is embedded into an end-to-end compression scheme called iWave3D, which is trained with a large amount of brain images to directly minimize the rate-distortion loss. Experimental results demonstrate that our method outperforms JP3D significantly by 2.012 dB in terms of average BD-PSNR.
Car counting on drone-based images is a challenging task in computer vision. Most advanced methods for counting are based on density maps. Usually, density maps are first generated by convolving ground truth point map...
详细信息
ISBN:
(纸本)9781728180687
Car counting on drone-based images is a challenging task in computer vision. Most advanced methods for counting are based on density maps. Usually, density maps are first generated by convolving ground truth point maps with a Gaussian kernel for later model learning (generation). Then, the counting network learns to predict density maps from input images (estimation). Most studies focus on the estimation problem while overlooking the generation problem. In this paper, a training framework is proposed to generate density maps by learning and train generation and estimation subnetworks jointly. Experiments demonstrate that our method outperforms other density map-based methods and shows the best performance on drone-based car counting.
Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and eas...
详细信息
ISBN:
(纸本)9781728180687
Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and easy to use. In some cases, these tools can be utilized to produce forgeries with the objective to change the semantic meaning of a photo or a video (e.g. fake news). Digital image forensics (DIF) includes two main objectives: the detection (and localization) of forgery and the identification of the origin of the acquisition (i.e. sensor identification). Since 2005, many classical methods for DIF have been designed, implemented and tested on several databases. Meantime, innovative approaches based on deep learning have emerged in other fields a nd have surpassed traditional techniques. In the context of DIF, deep learning methods mainly use convolutional neural networks (CNN) associated with significant preprocessing modules. This is an active domain and two possible ways to operate preprocessing have been studied: prior to the network or incorporated into it. None of the various studies on the digital image forensics provide a comprehensive overview of the preprocessing techniques used with deep learning methods. Therefore, the core objective of this article is to review the preprocessing modules associated with CNN models.
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make d...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Previous Deepfake detection methods perform well within their training domains, but their effectiveness diminishes significantly with new synthesis techniques. Recent studies have revealed that detection models make decision boundaries based on facial identity instead of synthetic artifacts, leading to poor cross-domain performance. To address this issue, we propose FRIDAY, a novel training method that attenuates facial identity utilizing a face recognizer. To be specific, we first train a face recognizer using the same backbone as the Deepfake detector. We then freeze the recognizer and use it during the detector's training to mitigate facial identity information. This is achieved by feeding input images into both the recognizer and the detector, then minimizing the similarity of their feature embeddings using our Facial Identity Attenuating loss. This process encourages the detector to produce embeddings distinct from the recognizer, effectively attenuating facial identity. Comprehensive experiments demonstrate that our approach significantly improves detection performance on both in-domain and cross-domain datasets.
暂无评论