In 2021, the newest MPEG standard was published as MPEG-5 low complexity enhancement video coding (LCEVC). Contrary to typical video codecs, LCEVC is an enhancement codec, meaning it works in combination with other co...
详细信息
In 2021, the newest MPEG standard was published as MPEG-5 low complexity enhancement video coding (LCEVC). Contrary to typical video codecs, LCEVC is an enhancement codec, meaning it works in combination with other codecs, to produce a more efficiently compressed video. Thanks to its simplified architecture, it is designed to be deployed as a software enhancer, which uses hardware blocks more efficiently. Despite being relatively new, it has already been adopted for a major next-gen television system (TV 3.0 in Brazil) and is being deployed across a full spectrum of applications, from broadcast to broadband. In this article we are focusing on future applications of LCEVC, from high dynamic range, 8K, and immersive video to metaverse, explaining how this new standard can make a positive impact on these applications.
With the emergence of light field imaging in recent years, the compression of its elementary image array (EIA) has become a significant problem. Our coding framework includes modeling and reconstruction. For the model...
详细信息
With the emergence of light field imaging in recent years, the compression of its elementary image array (EIA) has become a significant problem. Our coding framework includes modeling and reconstruction. For the modeling, the covariance-matrix form of the 4D Epanechnikov kernel (4D EK) and its correlated statistics were deduced to obtain the 4-D Epanechnikov mixture models (4-D EMMs). A 4D Epanechnikov mixture regression (4D EMR) was proposed based on this 4D EK, and a 4D adaptive model selection (4D AMLS) algorithm was designed to realize the optimal modeling for a pseudo video sequence (PVS) of the extracted key-EIA. A linear function based reconstruction (LFBR) was proposed based on the correlation between adjacent elementary images (EIs). The decoded images realized a clear outline reconstruction and superior coding efficiency compared to high-efficiency video coding (HEVC) and JPEG 2000 below approximately 0.05 bpp. This work realized an unprecedented theoretical application by (1) proposing the 4D Epanechnikov kernel theory, (2) exploiting the 4D Epanechnikov mixture regression and its application in the modeling of the pseudo video sequence of light field images, (3) using 4D adaptive model selection for the optimal number of models, and (4) employing a linear function-based reconstruction according to the content similarity.
This article introduces the ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization. The draft MIV standard provides support for viewing immersive volumetric content captured b...
详细信息
This article introduces the ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization. The draft MIV standard provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom (6DoF) within a viewing space that is determined by the camera arrangement in the capture rig. The bitstream format and decoding processes of the draft specification along with aspects of the Test Model for Immersive Video (TMIV) reference software encoder, decoder, and renderer are described. The use cases, test conditions, quality assessment methods, and experimental results are provided. In the TMIV, multiple texture and geometry views are coded as atlases of patches using a legacy 2-D video codec, while optimizing for bitrate, pixel rate, and quality. The design of the bitstream format and decoder is based on the visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC) standard, MPEG-I Part 5.
We observe that data access and processing takes a significant amount of time in large-scale deep learning training tasks (DLTs) on image datasets. Three factors contribute to this problem: (1) the massive and recurre...
详细信息
We observe that data access and processing takes a significant amount of time in large-scale deep learning training tasks (DLTs) on image datasets. Three factors contribute to this problem: (1) the massive and recurrent accesses to large numbers of small files;(2) the repeated, expensive decoding computation on each image, and (3) the frequent communication between computation nodes and storage nodes. Existing work has addressed some aspects of these problems;however, no end-to-end solutions have been proposed. In this article, we propose DIESEL+, an all-in-one system which accelerates the entire I/O pipeline of deep learning training tasks. DIESEL+ contains several components: (1) local metadata snapshot;(2) per-task distributed caching;(3) chunk-wise shuffling;(4) GPU-assisted image decoding and (5) online region-of-interest (ROI) decoding. The metadata snapshot removes the bottleneck on metadata access in frequent reading of large numbers of files. The per-task distributed cache across the worker nodes of a DLT task to reduce the I/O pressure on the underlying storage. The chunk-based shuffle method converts small file reads into large chunk reads, so that the performance is improved without sacrificing the training accuracy. The GPU-assisted image decoding and the online ROI method minimize the image decoding workloads and reduce the cost of data movement between nodes. These techniques are seamlessly integrated into the system. In our experiments, DIESEL+ outperforms existing systems by a factor of two to three times on the overall training time.
JPEG-domain enhancement improves the visual quality of JPEG images by directly manipulating the decoded DCT (discrete cosine transform) coefficients, which inevitably leads to mixed compression and enhancement artifac...
详细信息
JPEG-domain enhancement improves the visual quality of JPEG images by directly manipulating the decoded DCT (discrete cosine transform) coefficients, which inevitably leads to mixed compression and enhancement artifacts. Existing forensic methods that merely consider JPEG artifacts are unsuitable to address such mixed artifacts and hence suffer a considerable performance decline in compression parameter estimation and lack the ability to estimate the enhancement parameter. This work attempts to explore the characterization of the mixed artifacts, and to further estimate both the enhancement and compression parameters of JPEG-domain enhanced images. First, a statistical likelihood function is proposed to characterize the periodicity of DCT coefficients, which can measure how well an enhanced image is de-enhanced back to its JPEG compressed version given the compression and enhancement parameters. The proposed likelihood function reaches its maximum if the parameters match their true values. Then, a forensic method of enhancement detection and parameter estimation is developed based on the proposed likelihood function for two kinds of classical JPEG-domain enhancement. Specifically, JPEG-domain enhanced images are detected by thresholding a scalar feature computed upon the likelihoods, and the enhancement and compression parameters are estimated by locating the maximal likelihood. In addition, mathematical proof of the de-enhancement feasibility is provided. Experimental results demonstrate that the proposed method outperforms the compared methods in both enhancement detection and parameter estimation.
A large number of forensics research focus on operation detection to reveal the evidence of forgery action in the digital image. In the early works, analyst firstly model the probability distribution of single operati...
详细信息
A large number of forensics research focus on operation detection to reveal the evidence of forgery action in the digital image. In the early works, analyst firstly model the probability distribution of single operation, and design the forensic tools based on feature extraction and machine learning based classifier. With increasing dimension of the feature and facing multiple operations detection scenario, the physical meaning of the feature gradually become ambiguous. Especially, since deep learning algorithm was used in forensic research, the automatic feature selection and making decision with high performance of classification conceals the intrinsic forensic clues. In this paper, we explore the availability of feature for operation detection in the operation chain, so called forensicability. An anti-forensic attack algorithm is introduced to formulate the impact on the feature due to the following operation. We propose two measurements: attack angle and scale, mutual information scale to indicate the forensic feature variation after the image manipulated by the following operation. The uncoupled relationship can be revealed by our methods. In the experiments, four operation chains involving ten operations are considered as the case study. The results are encouraging and improve the explanation of the forensics method based on high dimensional features.
Steganalysis in real-world application often exhibit skewed sample distribution which poses a massive challenge for steganography detection. Conventional steganalysis algorithms are not effective when the training dat...
详细信息
Steganalysis in real-world application often exhibit skewed sample distribution which poses a massive challenge for steganography detection. Conventional steganalysis algorithms are not effective when the training data distribution is imbalanced, and may fail in the scenario of imbalanced data distribution. To address imbalanced data distribution issue in steganalysis, a novel framework termed adaptive cost-sensitive feature learning via F-measure maximization is proposed, which is inspired by the fact that F-measure is a more suitable performance metric compared to accuracy for imbalanced data. We investigate the adaptive cost-sensitive strategy by generating and assigning different weight to each instance with misclassification occurrence. This scheme adaptively determines the weights according to the intra-class and inter-class costs from the imbalanced distribution. Features corresponding to the largest F-measure can be obtained by solving a series of adaptive cost-sensitive feature learning problems with optimization theory. In this way, the learned features are the most representative features between the cover and stego images so that imbalanced steganalysis can significantly alleviate. Extensive experiments on various imbalanced steganalysis tasks show the superiority of the proposed method over the state-of-the-art methods, and it can recognize more minority samples and has excellent classification performance.
Adaptive Fourier decomposition (AFD) is a newly developed signal processing tool that can adaptively decompose any single signal using a Szego kernel dictionary. To process multiple signals, a novel stochastic-AFD (SA...
详细信息
Adaptive Fourier decomposition (AFD) is a newly developed signal processing tool that can adaptively decompose any single signal using a Szego kernel dictionary. To process multiple signals, a novel stochastic-AFD (SAFD) theory was recently proposed. The innovation of this study is twofold. First, a SAFD-based general multi-signal sparse representation learning algorithm is designed and implemented for the first time in the literature, which can be used in many signal and image processing areas. Second, a novel SAFD based image compression framework is proposed. The algorithm design and implementation of the SAFD theory and image compression methods are presented in detail. The proposed compression methods are compared with 13 other state-of-the-art compression methods, including JPEG, JPEG2000, BPG, and other popular deep learning-based methods. The experimental results show that our methods achieve the best balanced performance. The proposed methods are based on single image adaptive sparse representation learning, and they require no pre-training. In addition, the decompression quality or compression efficiency can be easily adjusted by a single parameter, that is, the decomposition level. Our method is supported by a solid mathematical foundation, which has the potential to become a new core technology in image compression.
In general, image restoration involves mapping from low-quality images to their high-quality counterparts. Such optimal mapping is usually nonlinear and learnable by machine learning. Recently, deep convolutional neur...
详细信息
In general, image restoration involves mapping from low-quality images to their high-quality counterparts. Such optimal mapping is usually nonlinear and learnable by machine learning. Recently, deep convolutional neural networks have proven promising for such learning processing. It is desirable for an image processing network to support well with three vital tasks, namely: 1) super-resolution;2) denoising;and 3) deblocking. It is commonly recognized that these tasks have strong correlations, which enable us to design a general framework to support all tasks. In particular, the selection of feature scales is known to significantly impact the performance on these tasks. To this end, we propose the cross-scale residual network to exploit scale-related features among the three tasks. The proposed network can extract spatial features across different scales and establish cross-temporal feature reusage, so as to handle different tasks in a general framework. Our experiments show that the proposed approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations for multiple image restoration tasks.
We present a simple and effective approach for non-blind image deblurring, combining classical techniques and deep learning. In contrast to existing methods that deblur the image directly in the standard image space, ...
详细信息
We present a simple and effective approach for non-blind image deblurring, combining classical techniques and deep learning. In contrast to existing methods that deblur the image directly in the standard image space, we propose to perform an explicit deconvolution process in a feature space by integrating a classical Wiener deconvolution framework with learned deep features. A multi-scale cascaded feature refinement module then predicts the deblurred image from the deconvolved deep features, progressively recovering detail and small-scale structures. The proposed model is trained in an end-to-end manner and evaluated on scenarios with simulated Gaussian noise, saturated pixels, or JPEG compression artifacts as well as real-world images. Moreover, we present detailed analyses of the benefit of the feature-based Wiener deconvolution and of the multi-scale cascaded feature refinement as well as the robustness of the proposed approach. Our extensive experimental results show that the proposed deep Wiener deconvolution network facilitates deblurred results with visibly fewer artifacts and quantitatively outperforms state-of-the-art non-blind image deblurring methods by a wide margin.
暂无评论