In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In this paper we introduce a novel approach to better utilize the intra block copy (IBC) prediction tool in encoding lenslet light field video (LFV) captured using plenoptic 2.0 cameras. Although the IBC tool has been recognized as promising for encoding LFV content, its fundamental limit due to its original design rooted for encoding conventional videos suggests slight modification possibility to better suit the property of LFV content. Observing the inherently large amount of repetitive image patterns due to the microlens array (MLA) structure of plenoptic cameras, several techniques are suggested in this paper to enhance the IBC coding tool itself for more efficiently encoding LFV contents. Our experimental results demonstrate that the proposed method significantly enhances the IBC coding performance in case of encoding LFV contents while concurrently reducing encoding time.
In this paper, we propose an end-to-end image compression framework, which cooperates with the swin-transformer modules to capture the localized and non-localized similarities in image compression. In particular, the ...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose an end-to-end image compression framework, which cooperates with the swin-transformer modules to capture the localized and non-localized similarities in image compression. In particular, the swin-transformer modules are deployed in the analysis and synthesis stages, interleaving with convolution layers. The transformer layers are expected to perceive more flexible receptive fields, such that the spatially localized and non-localized redundancies could be more effectively eliminated. The proposed method reveals the excellent capability of signal conjunction and prediction, leading to the improvement of the rate and distortion performance. Experimental results show that the proposed method is superior to the existing methods on both natural scene and screen content images, where 22.46% BD-Rate savings are achieved when compared with the BPG. Over 30% BD-Rate gains could be observed with screen content images when compared with the classical hyper-prior end-to-end coding method.
Hyperspectral imaging captures a high number of spectrally narrow bands and provides advantages for image analysis applications such as identification and classification in particular. However, for the visual inspecti...
详细信息
ISBN:
(纸本)9781479948741
Hyperspectral imaging captures a high number of spectrally narrow bands and provides advantages for image analysis applications such as identification and classification in particular. However, for the visual inspection of hyperspectral images, the data is conventionally converted to a standard color image format. It is important that as much detail and data as possible is retained during this conversion. A novel hyperspectral visualization approach based on high dynamic range imaging is presented in this paper. The proposed approach retains visual detail and provides a superior result in terms of visual quality.
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficienc...
详细信息
ISBN:
(纸本)9781728185514
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficiency of learning-based image and video codecs. At the encoding stage, one or more neural networks that are part of the codec are finetuned using the input image or video to achieve a better coding performance. The encoder encodes the input content into a content bitstream. If the finetuned neural network is part (also) of the decoder, the encoder signals the weight update of the finetuned model to the decoder along with the content bitstream. At the decoding stage, the decoder first updates its neural network model according to the received weight update, and then proceeds with decoding the content bitstream. Since a neural network contains a large number of parameters, compressing the weight update is critical to reducing bitrate overhead. In this paper, we propose learning-based methods to find the important parameters to be overfitted, in terms of rate-distortion performance. Based on simple distribution models for variables in the weight update, we derive two objective functions. By optimizing the proposed objective functions, the importance scores of the parameters can be calculated and the important parameters can be determined. Our experiments on lossless image compression codec show that the proposed method significantly outperforms a prior-art method where overfitted parameters were selected based on heuristics. Furthermore, our technique improved the compression performance of the state-of-the-art lossless image compression codec by 0.1 bit per pixel.
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between featur...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
To achieve efficient compression for both human vision and machine perception, scalable coding methods have been proposed in recent years. However, existing methods do not fully eliminate the redundancy between features corresponding to different tasks, resulting in suboptimal coding performance. In this paper, we propose a frequency-aware hierarchical image compression framework designed for humans and machines. Specifically, we investigate task relationships from a frequency perspective, utilizing only HF information for machine vision tasks and leveraging both HF and LF features for image reconstruction. Besides, the residual block embedded octave convolution module is designed to enhance the information interaction between HF features and LF features. Additionally, a dual-frequency channel-wise entropy model is applied to reasonably exploit the correlation between different tasks, thereby improving multi-task performance. The experiments show that the proposed method offers -69.3%similar to-75.3% coding gains on machine vision tasks compared to the relevant benchmarks, and -19.1% gains over state-of-the-art scalable image codec in terms of image reconstruction quality.
In this paper, we propose a watermarking method that does not requires an original image to extract embedded data. In our proposal, watermark information data are embedded by using the differences between the neighbor...
详细信息
ISBN:
(纸本)0819444111
In this paper, we propose a watermarking method that does not requires an original image to extract embedded data. In our proposal, watermark information data are embedded by using the differences between the neighboring wavelet coefficients of the lowest band in the wavelet domain. Though such a concept have been already proposed by H.S. Kim et al., the original image is needed for extracting embedded data in their method. We developed a new scheme which does not requires the original image to extract embedded data, by modifying the embedding algorithm. Simulation results show that our proposed scheme indicates good picture quality in the watermarked. image and robustness to some types of imageprocessing attacks including JPEG compression.
As media processing gradually migrates from hardware to software programmable platforms, the number of media processing functions added on the media processor grow even faster than the ever-increasing media processor ...
详细信息
ISBN:
(纸本)0819444111
As media processing gradually migrates from hardware to software programmable platforms, the number of media processing functions added on the media processor grow even faster than the ever-increasing media processor power can support. Computational complexity scalable algorithms become powerful vehicles for implementing many time-critical yet complexity-constrained applications, such as MPEG2 video decoding. In this paper, we present an adaptive resource-constrained complexity scalable MPEG2 video decoding scheme that makes a good trade-off between decoding complexity and output quality. Based on the available computational resources and the energy level of B-frame residuals, the scalable decoding algorithm selectively decodes B-residual blocks to significantly reduce system complexity. Furthermore, we describe an iterative procedure designed to dynamically adjust the complexity levels in order to achieve the best possible output quality under a given resource constraint. Experimental results show that up to 20% of total computational complexity reduction can be obtained with satisfactory output visual quality.
A 4-dimensional (4D) image can be viewed as a stack of volumetric images over channels of observation depth or temporal frames. This data contains rich information at the cost of high demands for storage and transmiss...
详细信息
ISBN:
(纸本)9781728180687
A 4-dimensional (4D) image can be viewed as a stack of volumetric images over channels of observation depth or temporal frames. This data contains rich information at the cost of high demands for storage and transmission resources due to its large volume. In this paper, we present a lossless 4D image compression algorithm by extending CCSDS-123.0-B-1 standard. Instead of separately compressing the volumetric image at each channel of 4D images, the proposed algorithm efficiently exploits redundancy across the fourth dimension of data. Experiments conducted on two types of 4D images demonstrate the effectiveness of the proposed lossless compression method.
A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports ...
详细信息
ISBN:
(纸本)9781728180687
A larger portion of fake news quotes untampered images from other sources with ulterior motives rather than conducting image forgery. Such elaborate engraftments keep the inconsistency between images and text reports stealthy, thereby, palm off the spurious for the genuine. This paper proposes an architecture named News image Steganography (NIS) to reveal the aforementioned inconsistency through image steganography based on GAN. Extractive summarization about a news image is generated based on its source texts, and a learned steganographic algorithm encodes and decodes the summarization of the image in a manner that approaches perceptual invisibility. Once an encoded image is quoted, its source summarization can be decoded and further presented as the ground truth to verify the quoting news. The pairwise encoder and decoder endow images of the capability to carry along their imperceptible summarization. Our NIS reveals the underlying inconsistency, thereby, according to our experiments and investigations, contributes to the identification accuracy of fake news that engrafts untampered images.
We develop a method for automatic segmentation of natural video sequences. The method is based on low-level spatial and temporal analyses. It features three designs to help facilitate good region segmentation while ke...
详细信息
ISBN:
(纸本)0819444111
We develop a method for automatic segmentation of natural video sequences. The method is based on low-level spatial and temporal analyses. It features three designs to help facilitate good region segmentation while keeping the computational complexity at a reasonable level. Firstly, a preliminary seed-area identification and a final re-segmentation process are performed on each video frame to help region tracking. Secondly, a simple way to measure homogeneity of texture in a region is devised and the segmentation tries to locate object boundaries at where the texture shows significant changes. And thirdly, a reduced-complexity motion estimation technique is used, so that dense motion fields can be computed at a reasonable complexity. The overall method is organized into four tasks, namely, seed-area identification (for each frame), initial segmentation (only for the first frame in the sequence), motion-based segmentation (for all later frames), and region tracking and updating (also for all later frames). Some examples are provided to illustrate the performance of this method.
暂无评论