3D high-efficiency video coding (3D-HEVC) is an extension of the HEVC standard for coding of texture videos and depth maps. 3D-HEVC inherits the same quadtree coding structure as HEVC for both texture and depth compon...
详细信息
3D high-efficiency video coding (3D-HEVC) is an extension of the HEVC standard for coding of texture videos and depth maps. 3D-HEVC inherits the same quadtree coding structure as HEVC for both texture and depth components, in which the coding units (CUs) are recursively conducted on different sizes, namely, depth levels. However, the recursive splitting process of the CU causes extensive computational complexity. To reduce this computational burden, this paper presents an adaptive CU size decision algorithm for texture videos and depth maps. The proposed algorithm is divided into three steps. In the first step, the average local variance (ALV) is extracted from each CU size to define their homogeneity. Then, a classification-based gradient boosting machines (GBM) is employed to analyze and build a binary classification model from the extracted ALV features. The GBM model is employed to extract and efficiently get suitable thresholds for texture and depth map CUs. In the last step, a fast CU size decision algorithm is performed based on adaptive thresholds for texture videos and depth maps. The experimental results show that the proposed algorithm reduces a significant amount of encoding time, while the loss in coding efficiency is negligible.
The scalable extension of the high efficiency video coding standard named SHVC supports flexible access for various terminals in heterogeneous networks. However, it is difficult to use in real-time scenarios because o...
详细信息
The scalable extension of the high efficiency video coding standard named SHVC supports flexible access for various terminals in heterogeneous networks. However, it is difficult to use in real-time scenarios because of the high complexity of the hierarchical coding structure. In this paper, a novel method for SHVC inter-coding is proposed to reduce the coding complexity in a manner that is compatible with quality scalability and spatial scalability. First, the depth range of the coding tree units is estimated from a reference table generated from a statistical probability distribution based on the correlation between the current coding unit (CU) and its adjacent CUs. Within this depth range, a fast CU partitioning method based on Bayesian minimum risk and a fast prediction unit (PU) selection method based on Bayesian maximum probability are adopted to improve time efficiency. Three different methods, namely, histogram estimation, Gaussian modelling and neighbouring prediction, are used to calculate the conditional probabilities for discrete or continuous features in the Bayesian decision method. The significant advantage of the proposed method is that the time savings in the enhancement layer for each sequence exceeds 60% with negligible quality loss.
3D-high efficiency video coding (3D-HEVC) is an extension of the high efficiency video coding (HEVC) standard for the compression of the texture videos and depth maps. In 3D-HEVC inter-coding, the coding unit (CU) is ...
详细信息
3D-high efficiency video coding (3D-HEVC) is an extension of the high efficiency video coding (HEVC) standard for the compression of the texture videos and depth maps. In 3D-HEVC inter-coding, the coding unit (CU) is recursively performed on variable sizes, namely, depth levels. The CU size decision process is conducted using all the possible depth levels to obtain the one with the least rate-distortion (RD) cost using the Lagrange multiplier. These tools achieve the highest coding efficiency but incur a very high computational complexity. In this paper, a fast CU size decision algorithm is proposed to reduce the complexity caused by the CU size splitting process. The proposed algorithm is based on the CU homogeneity classification using machine learning technology. First, the tensor feature is extracted to characterize the homogeneity of CU, which has a strong relationship with CU sizes. Then, a boosted decision stump algorithm is employed to analyze and construct a binary classification model from the extracted features and find suitable thresholds for the proposed method. Finally, an efficient early termination of CU splitting is released based on adaptive thresholds for texture videos and depth maps. The experimental results show that the proposed algorithm reduces a significant encoding time, while the loss in coding efficiency is negligible.
Multi-view video plus depth (MVD) is a mainstream format of 3D scene representation in free viewpoint video systems. The advanced 3D extension of the high efficiency video coding (3D-HEVC) standard introduces new pred...
详细信息
Multi-view video plus depth (MVD) is a mainstream format of 3D scene representation in free viewpoint video systems. The advanced 3D extension of the high efficiency video coding (3D-HEVC) standard introduces new prediction tools to improve the coding performance of depth video. However, the depth video in 3D-HEVC is time consuming. To reduce the complexity of the depth video intercoding, we propose a fast coding unit (CU) size and mode decision algorithm. First, an off-line trained Bayesian model is built which the feature vector contains the depth levels of the corresponding spatial, temporal, and inter-component (texture-depth) neighboring largest CUs (LCUs). Then, the model is used to predict the depth level of the current LCU, and terminate the CU recursive splitting process. Finally, the CU mode search process is early terminated by making use of the mode correlation of spatial, inter-component (texture-depth), and inter-view neighboring CUs. Compared to the 3D-HEVC reference software HTM-10.0, the proposed algorithm reduces the encoding time of depth video and the total encoding time by 65.03% and 41.04% on average, respectively, with negligible quality degradation of the synthesized virtual view.
Video compression exploits statistical, spatial, and temporal redundancy, as well as transform and quantization. In particular, the transform in a frequency domain plays a major role in energy compaction of spatial do...
详细信息
Video compression exploits statistical, spatial, and temporal redundancy, as well as transform and quantization. In particular, the transform in a frequency domain plays a major role in energy compaction of spatial domain data into frequency domain data. The high efficient video coding standard uses the type-II discrete cosine transform (DCT-II) and type-VII discrete sine transform (DST-VII) to improve the coding efficiency of residual data. However, the DST-VII is applied only to the Intra 4 x 4 residual block because it yields relatively small gains in the larger block than in the 4 x 4 block. In this study, after rearranging the data of the residual block, we apply the DST-VII to the inter-residual block to achieve coding gain. The rearrangement of the residual block data is similar to the arrangement of the basis vector with a the lowest frequency component of the DST-VII. Experimental results show that the proposed method reduces the luma-chroma (Cb+Cr) BD rates by approximately 0.23% to 0.22%, 0.44% to 0.58%, and 0.46% to 0.65% for the random access, low delay B, and low delay P configurations, respectively.
The rapid advancement of 3D sensing and rendering technologies has expanded the use of point clouds across various fields. To address the challenge of managing large point clouds, Point Cloud Compression (PCC) has gai...
详细信息
As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as immersive telepresence, navi...
详细信息
As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as immersive telepresence, navigation for autonomous driving and gaming. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. To this end, we propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model with respect to a spatio-temporal graph underlying the attributes of dynamic point clouds. The optimal predictive transform proves to be the Generalized Graph Fourier Transform in terms of spatio-temporal decorrelation. Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of irregular point clouds. Finally, we present a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained lambda-Q model. Experimental results show that we achieve around 17% bit rate reduction on average over competitive dynamic point cloud compression methods.
Due to the huge volume of point cloud data, storing or transmitting it is currently difficult and expensive in autonomous driving. Learning from the high efficiency video coding (HEVC) coding framework, we propose an ...
详细信息
ISBN:
(纸本)9781450379885
Due to the huge volume of point cloud data, storing or transmitting it is currently difficult and expensive in autonomous driving. Learning from the high efficiency video coding (HEVC) coding framework, we propose an advanced coding scheme for large-scale LiDAR point cloud sequences, in which several techniques have been developed to remove the spatial and temporal redundancy. The proposed strategy consists mainly of intra-coding and inter-coding. For intra-coding, we utilize a cluster-based prediction method to remove the spatial redundancy. For inter-coding, a predictive recurrent network is designed, which is capable of generating future frames according to the previously encoded frames. By calculating the residual error between the predicted and real point cloud data, the temporal redundancy can be removed. Finally, the residual data is quantized and encoded by lossless coding schemes. Experiments are conducted on the KITTI data set with four different scenes to verify the effectiveness and efficiency of the proposed method. Our approach can deal with multiple types of point cloud data from the simple to more complex, and yields better performance in terms of compression ratio compared with octree, Google Draco, MPEG TMC13 and other recently proposed methods.
This letter proposes a method for lossless coding the left disparity image, L, from a stereo disparity image pair (L, R), conditional on the right disparity image, R, by keeping track of the transformation of the cons...
详细信息
This letter proposes a method for lossless coding the left disparity image, L, from a stereo disparity image pair (L, R), conditional on the right disparity image, R, by keeping track of the transformation of the constant patches from R to L. The disparities in R are used for predicting the disparities in L, and the locations of the pixels where the prediction is erroneous are encoded in a first stage, conditional on the patch-labels of R image, allowing the decoder to already reconstruct with certainty some elements of the L image, e.g., the disparity values at certain pixels and parts of the contours of left image patches. Second, the contours of the patches in L image that are still unknown after first stage are conditionally encoded using a mixed conditioning context: the usual causal current context from the contours of L and a noncausal context extracted from the contours in the correctly estimated part of L obtained in the first stage. The depth values in the patches of L image are finally encoded, if they are not already known from the prediction stage. The new algorithm, dubbed conditional crack-edge region value (C-CERV), is shown to perform significantly better than the non-conditional coding method CERV and than another existing conditional coding method, over the Middlebury corpus. C-CERV is shown to reach lossless compression ratios of 100-250 times for those images that have a high precision of the disparity map.
暂无评论