The usage of linear transformations has great relevance for data decorrelation applications, like image and video compression. In that sense, the discrete Tchebichef transform (DTT) possesses useful coding and decorre...
详细信息
The usage of linear transformations has great relevance for data decorrelation applications, like image and video compression. In that sense, the discrete Tchebichef transform (DTT) possesses useful coding and decorrelation properties. The DTT transform kernel does not depend on the input data and fast algorithms can be developed to real-time applications. However, the DTT fast algorithm presented in literature possess high computational complexity. In this paper, we introduce a new low-complexity approximation for the DTT. The fast algorithm of the proposed transform is multiplication free and requires a reduced number of additions and bit-shifting operations. image and video compression simulations in popular standards show good performance of the proposed transform. Regarding hardware resource consumption for FPGA shows a 43.1% reduction in configurable logic blocks and ASIC place and route realization shows a 57.7% reduction in the area-time figure compared with the 2D version of the exact DTT.
The discrete cosine transform (DCT) is a relevant tool in signal processing applications, mainly known for its good decorrelation properties. Current image and video coding standards-such as JPEG and HEVC-adopt the DC...
详细信息
The discrete cosine transform (DCT) is a relevant tool in signal processing applications, mainly known for its good decorrelation properties. Current image and video coding standards-such as JPEG and HEVC-adopt the DCT as a fundamental building block for compression. Recent works have introduced low-complexity approximations for the DCT, which become paramount in applications demanding real-time computation and low-power consumption. The design of DCT approximations involves a trade-off between computational complexity and performance. This paper introduces a new multiparametric transform class encompassing the round-off DCT (RDCT) and the modified RDCT (MRDCT), two relevant multiplierless 8-point approximate DCTs. The associated fast algorithm is provided. Four novel orthogonal low-complexity 8-point DCT approximations are obtained by solving a multicriteria optimization problem. The optimal 8-point transforms are scaled to lengths 16 and 32 while keeping the arithmetic complexity low. The proposed methods are assessed by proximity and coding measures with respect to the exact DCT. image and video coding experiments and hardware realization are performed. The novel transforms perform close to or outperform the current state-of-the-art DCT approximations.
Machine visual intelligence has exploded in recent years. Large-scale, high-quality image and video datasets significantly empower learning-based machine vision models, especially deep-learning models. However, images...
详细信息
Machine visual intelligence has exploded in recent years. Large-scale, high-quality image and video datasets significantly empower learning-based machine vision models, especially deep-learning models. However, images and videos are usually compressed before being analyzed in practical situations where transmission or storage is limited, leading to a noticeable performance loss of vision models. In this work, we broadly investigate the impact on the performance of machine vision from image and video coding. Based on the investigation, we propose Just Recognizable Distortion (JRD) to present the maximum distortion caused by data compression that will reduce the machine vision model performance to an unacceptable level. A large-scale JRD-annotated dataset containing over 340,000 images is built for various machine vision tasks, where the factors for different JRDs are studied. Furthermore, an ensemble-learning-based framework is established to predict the JRDs for diverse vision tasks under few- and non-reference conditions, which consists of multiple binary classifiers to improve the prediction accuracy. Experiments prove the effectiveness of the proposed JRD-guided image and video coding to significantly improve compression and machine vision performance. Applying predicted JRD is able to achieve remarkably better machine vision task accuracy and save a large number of bits.
In this letter, we introduce a low-complexity approximation for the discrete Tchebichef transform (DTT). The proposed forward and inverse transforms are multiplication-free and require a reduced number of additions an...
详细信息
In this letter, we introduce a low-complexity approximation for the discrete Tchebichef transform (DTT). The proposed forward and inverse transforms are multiplication-free and require a reduced number of additions and bit-shifting operations. Numerical compression simulations demonstrate the efficiency of the proposed transform for image and video coding. Furthermore, Xilinx Virtex-6 FPGA based hardware realization shows 44.9% reduction in dynamic power consumption and 64.7% lower area when compared to the literature.
Especially at low bit-rates current DCT-based videocoding standards suffer from the disadvantage that coarse quantization and a rigid block structure result in noticeable blocking and ringing noise. In this paper we ...
详细信息
Especially at low bit-rates current DCT-based videocoding standards suffer from the disadvantage that coarse quantization and a rigid block structure result in noticeable blocking and ringing noise. In this paper we propose a spatially adaptive method for reduction of these coding artifacts based on the principle of constrained least-squares image restoration. A strictly local filter is developed which adapts to the spatial image characteristics as well as to the coding conditions. Due to the small filter kernel, the method can be applied to frame as well as object-based videocoding and is also suited for quality improvement in arbitrarily shaped MPEG-4 coded video material. The proposal is numerically scalable and yields visually pleasing results for intra- as well as interframe coded images. This is also reflected by consistent PSNR improvements between 0.2 and 1.3 dB. As post-processing technique, it is compatible to all existing image and video coding standards. (C) 2000 Elsevier Science B.V. All rights reserved.
Considerable facial data is often compressed prior to analysis to accommodate limitations in transmission or storage capacities. However, this compression may lead to the loss of crucial identity details, thereby dimi...
详细信息
Considerable facial data is often compressed prior to analysis to accommodate limitations in transmission or storage capacities. However, this compression may lead to the loss of crucial identity details, thereby diminishing the effectiveness of facial recognition (FR) systems. In this study, we aim to establish an optimal accuracy-rate equilibrium (ARE) to maximize the compression ratio without substantially compromising the performance of the FR system. We first investigate the effect of image compression in deep FR. Based on the definition of ARE in FR, we proposed a method to locate the ARE values of face images with true acceptance rate and false accept rate. Subsequently, we develop an ARE prediction method for the FR system (ARE-FR), which automatically infers ARE images of face images. The goal of the proposed ARE-FR is to maximize redundancy removal without impairment of robust identity information. Considering that high-level semantic features effectively capture crucial identity information, we force the proposed ARE-FR to focus only on the features in identity-related regions when predicting the ARE images. These features are derived from the interactive relationships between deep and shallow features. Experimental results have demonstrated that combining our proposed ARE-FR with the imagecoding algorithm is capable of saving more bits while maintaining the performance of the FR system.
This paper proposes a unified one-dimensional (1-D) coding framework of image and video, which depends on deep learning neural network and image patch clustering. First, an improved K-means clustering algorithm for im...
详细信息
This paper proposes a unified one-dimensional (1-D) coding framework of image and video, which depends on deep learning neural network and image patch clustering. First, an improved K-means clustering algorithm for image patches is employed to obtain the compact inputs of deep artificial neural network. Second, for the purpose of best reconstructing original image patches, deep linear autoencoder (DLA), a linear version of the classical deep nonlinear autoencoder, is introduced to achieve the 1-D representation of image blocks. Under the circumstances of 1-D representation, DLA is capable of attaining zero reconstruction error, which is impossible for the classical nonlinear dimensionality reduction methods. Third, a unified 1-D coding infrastructure for image, intraframe, interframe, multiview video, three-dimensional (3-D) video, and multiview 3-D video is built by incorporating different categories of videos into the inputs of patch clustering algorithm. Finally, it is shown in the results of simulation experiments that the proposed methods can simultaneously gain higher compression ratio and peak signal-to-noise ratio than those of the state-of-the-art methods in the situation of low bitrate transmission. (C) 2017 SPIE and IS&T
Wyner - Ziv (WZ) videocoding is a particular case of distributed videocoding (DVC), the recent videocoding paradigm based on the Slepian - Wolf and Wyner - Ziv theorems which exploits the source temporal correlatio...
详细信息
Wyner - Ziv (WZ) videocoding is a particular case of distributed videocoding (DVC), the recent videocoding paradigm based on the Slepian - Wolf and Wyner - Ziv theorems which exploits the source temporal correlation at the decoder and not at the encoder as in predictive videocoding. Although some progress has been made in the last years, WZ videocoding is still far from the compression performance of predictive videocoding, especially for high and complex motion contents. The WZ video codec adopted in this study is based on a transform domain WZ videocoding architecture with feedback channel-driven rate control, whose modules have been improved with some recent coding tools. This study proposes a novel motion learning approach to successively improve the rate-distortion (RD) performance of the WZ video codec as the decoding proceeds, making use of the already decoded transform bands to improve the decoding process for the remaining transform bands. The results obtained reveal gains up to 2.3 dB in the RD curves against the performance for the same codec without the proposed motion learning approach for high motion sequences and long group of pictures (GOP) sizes.
The authors introduce residue-free videocoding, in which motion-compensated predictions from surrounding frames and spatial predictions from the current frame are combined adaptively on a pixel-by-pixel basis. The co...
详细信息
The authors introduce residue-free videocoding, in which motion-compensated predictions from surrounding frames and spatial predictions from the current frame are combined adaptively on a pixel-by-pixel basis. The consequence is that residue frames, blocks or regions are never explicitly formed. The authors describe a practical embodiment of a residue-free coder - temporal prediction trees - in which the local adaptation is conditioned frame to frame by a control parameter derived from global motion statistics. Using fixed-block-size motion compensation, the resulting coder is competitive with conventional residue-based compression, and at higher data rates is able to outperform H.264/AVC for high-activity sequences.
The authors present a general framework of the construction of biorthogonal wavelets based on Bernstein bases along with theory analysis and application. The presented framework possesses the largest possible regulari...
详细信息
The authors present a general framework of the construction of biorthogonal wavelets based on Bernstein bases along with theory analysis and application. The presented framework possesses the largest possible regularity, the required vanishing moments and the passband flatness of frequency response of filters. Based on this concept, the authors establish explicit formulas for filters of biorthogonal wavelets with arbitrary odd lengths. Meanwhile, a new family of parametric filters with symmetry is constructed. The choice of filter bank in wavelet compression is a critical issue that affects image quality. In this study, an optimal model of FIR aiming at image compression is brought forward, and the optimal finite impulse response (FIR) filters can be obtained correspondingly through sequential quadratic programming (SQP) and genetic algorithm (GA). The authors demonstrate the performance of the new family of filters given in this study for image compression with very encouraging results.
暂无评论