This paper proposes a fast VVC coding unit partition algorithm based on ensemble convolutional neural network (CNN) by investigating and bagging spatial-temporal adjacent coding features. First, we propose an ensemble...
详细信息
ISBN:
(纸本)9781665492577
This paper proposes a fast VVC coding unit partition algorithm based on ensemble convolutional neural network (CNN) by investigating and bagging spatial-temporal adjacent coding features. First, we propose an ensemble CNN framework to aggregate the reference features to predict the depths of uncoded CUs. The proposed model consists of three light-weight CNNs, which can compromise prediction accuracy with overhead. Then a majority voting mechanism is used to unify the predicted depth. By extracting the majority prediction of base learners, the outputs of three CNNs are integrated to obtain the final prediction. To avoid Rate Distortion (RD) loss caused by a small probability of prediction failure, we introduce the optimal depth strategy. During the encoding process, the optimal depth is used for the decision-making of coding unit partition, thus avoiding redundant rate distortion optimization process. Compared with the original encoder, the proposed algorithm saves 21.56% encoding time on average, with a BDBR loss of 0.39%. The performance is even superior in High-Definition (HD) and Ultra HD (UHD) sequences, up to 59.52%. This approach has a great efficiency of time reduction compared with state-of-the-arts with negligible RD performance loss.
We have witnessed the revolutionary progress of learned image compression despite a short history of this field. Some challenges still remain such as computational complexity that prevent the practical application of ...
详细信息
ISBN:
(纸本)9781728192017
We have witnessed the revolutionary progress of learned image compression despite a short history of this field. Some challenges still remain such as computational complexity that prevent the practical application of learning-based codecs. In this paper, we address the issue of heavy time complexity from the view of arithmetic coding. Prevalent learning-based image compression scheme first maps the natural image into latent representations and then conduct arithmetic coding on quantized latent maps. Previous arithmetic coding schemes define the start and end value of the arithmetic codebook as the minimum and maximum of the whole latent maps, ignoring the fact that the value ranges in most channels are shorter. Hence, we propose to use a channel-adaptive codebook to accelerate arithmetic coding. We find that the latent channels have different frequency-related characteristics, which are verified by experiments of neural frequency filtering. Further, the value ranges of latent maps are different across channels which are relatively image-independent. The channel-adaptive characteristics allow us to establish efficient prior codebooks that cover more appropriate ranges to reduce the runtime. Experimental results demonstrate that both the arithmetic encoding and decoding can be accelerated while preserving the rate-distortion performance of compression model.
In the emerging video coding standard, Versatile Video coding (VVC), a quadtree with nested multi-type tree (MTT) using binary and ternary tree structure was proposed. MTT brings significant coding efficiency but incr...
详细信息
ISBN:
(纸本)9781538662496
In the emerging video coding standard, Versatile Video coding (VVC), a quadtree with nested multi-type tree (MTT) using binary and ternary tree structure was proposed. MTT brings significant coding efficiency but increases the encoding complexity. In this paper, a look-ahead prediction based coding unit size pruning algorithm is proposed to cut down redundant MTT partitions. The proposed scheme aims to identify the unnecessary partition direction in advance and consists of two steps, i.e. SATD-based mode decision (SMD) for possible blocks and refined cost derivation based on rate-distortion optimization. Experimental results show that the proposed method can save 41% encoder time with only 0.84% increase in bit rate on average.
We propose a fast Bag-of-Words (BoW) method for image classification, inspired by the mechanism that arrangement of neurons in visual cortex can preserve the topology of mapping from inputs, and the fact that human br...
详细信息
ISBN:
(纸本)9783030367183;9783030367176
We propose a fast Bag-of-Words (BoW) method for image classification, inspired by the mechanism that arrangement of neurons in visual cortex can preserve the topology of mapping from inputs, and the fact that human brain can retrieve information almost instantly. We propose algorithms for accelerating both Self-Organizing Map (SOM) training and BoW coding. First, we modify the traditional SOM based on the matrix factorization form of K-means. Utilizing the topology-preserving property of dictionary learned by SOM, the coding process of BoW can be accelerated by fast search of k-nearest neighbor codewords in the grid of SOM dictionary. We evaluate the proposed method in different coding scenarios for image classification task on MNIST and CIFAR-10 datasets. The results show that the proposed method accelerates BoW classification greatly with little loss of classification accuracy.
暂无评论