In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited qual...
详细信息
ISBN:
(纸本)9781728173221
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited quality improvement. Direct using of denoising methods causes coding performance degradation, and hence not suitable for video coding scenario. In this work, we propose a video pre-processing approach by leveraging edge preserving filter specifically designed for video coding, of which filter parameters are optimized in the sense of rate-distortion (R-D) performance. The proposed pre-processing method removes low R-D cost-effective components for video encoder while keeping important structural components, leading to higher coding efficiency and also better subjective quality. Comparing with the conventional denoising filters, our proposed pre-processing method using the R-D optimized edge preserving filter can improve the coding efficiency by up to −5.2% BD-rate with low computational complexity.
In this work we propose a new mixed-input neural network for instance level monocular object tracking. The proposed Y-like architecture has two inputs and one output. An object id with a quaternion representing the ob...
详细信息
ISBN:
(纸本)9781728173221
In this work we propose a new mixed-input neural network for instance level monocular object tracking. The proposed Y-like architecture has two inputs and one output. An object id with a quaternion representing the object rotation in the previous frame are fed to the first input, whereas the object sub-window is fed to the second input. We demonstrate that on the basis of quaternions the neural network learns attention blobs representing the object rotation in the previous frame. A single neural network has been trained for six objects to estimate their fiducial points in sequences of RGB images. A tracking by optimization approach has been leveraged in the experiments. The algorithm has been evaluated on the OPT benchmark dataset for 6DoF object pose tracking as well as a custom dataset including image sequences with both real and rendered objects.
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with ...
详细信息
ISBN:
(纸本)9781728173221
With the development of airplane platforms, aerial image classification plays an important role in a wide range of remote sensing applications. The number of most of aerial image dataset is very limited compared with other computer vision datasets. Unlike many works that use data augmentation to solve this problem, we adopt a novel strategy, called, label splitting, to deal with limited samples. Specifically, each sample has its original semantic label, we assign a new appearance label via unsupervised clustering for each sample by label splitting. Then an optimized triplet loss learning is applied to distill domain specific knowledge. This is achieved through a binary tree forest partitioning and triplets selection and optimization scheme that controls the triplet quality. Simulation results on NWPU, UCM and AID datasets demonstrate that proposed solution achieves the state-of-the-art performance in the aerial image classification.
Stereo depth estimation is dependent on optimal correspondence matching between pixels of stereo-pair image to infer depth. In this paper, we attempt to revisit the stereo depth estimation problem in a simple dual con...
详细信息
ISBN:
(纸本)9781728173221
Stereo depth estimation is dependent on optimal correspondence matching between pixels of stereo-pair image to infer depth. In this paper, we attempt to revisit the stereo depth estimation problem in a simple dual convolutional neural network (CNN) based on EfficientNet that avoids the construction of a cost volume in stereo matching. This has been performed by considering different weights in otherwise identical towers of the CNN. The proposed algorithm is dubbed as SDE-DualENet. The architecture of SDE-DualENet eliminates the construction of cost-volume by learning to match correspondence between pixels with a different set of weights in the dual towers. The results are demonstrated on complex scenes with high details and large depth variations. The SDE-DualENet depth prediction network outperforms state-of-the-art monocular and stereo depth estimation methods, both qualitatively and quantitatively on challenging scene flow dataset. The code and pre-trained models will be made publicly available.
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations amo...
详细信息
ISBN:
(纸本)9781728173221
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a non-linear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.
Adding subtle perturbations to an image can cause the classification model to misclassify, and such images are called adversarial examples. Adversarial examples threaten the safe use of deep neural networks, but when ...
详细信息
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representa...
详细信息
ISBN:
(纸本)9781728173221
Recently, network-based image Compressive Sensing (ICS) algorithms show superior performance in reconstruction quality and speed, yet non-interpretable. Herein, we propose an Adaptive Threshold-based Sparse Representation Reconstruction Network (ATSR-Net), composed of the Convolutional Sparse Representation subnet (CSR-subnet) and the truly Adaptive Threshold Generation subnet (ATG-subnet). The traditional iterations are unfolded into several CSR-subnets, which can fully exploit the local and nonlocal similarities. The ATG-subnet automatically determines a threshold map based on the image intrinsic characterization for flexible feature selection. Moreover, we present a three-level consistency loss based on pixel-level, measurement-level, and feature-level, to accelerate the network convergence. Extensive experiment results demonstrate the superiority of the proposed network to the existing state-of-the-art methods by large margins, both quantitatively and qualitatively.
We propose an effective framework for human action recognition on raw depth maps. We leverage a convolutional autoencoder to extract on sequences of deep maps the frame-features that are then fed to a 1D-CNN responsib...
详细信息
ISBN:
(纸本)9781728173221
We propose an effective framework for human action recognition on raw depth maps. We leverage a convolutional autoencoder to extract on sequences of deep maps the frame-features that are then fed to a 1D-CNN responsible for embedding action features. A Siamese neural network trained on repre-sentative single depth map for each sequence extracts features, which are then processed by shapelets algorithm to extract action features. These features are then concatenated with features extracted by a BiLSTM with TimeDistributed wrapper. Given the learned individual models on such features we perform a selection of a subset of models. We demonstrate experimentally that on SYSU 3DHOI dataset the proposed algorithm outperforms considerably all recent algorithms including skeleton-based ones.
Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and f...
详细信息
ISBN:
(纸本)9781728173221
Camera calibration for sport videos enables precise and natural delivery of graphics on video footage and several other special effects. This in turns substantially improves the visual experience in the audience and facilitates sports analysis within or after the live show. In this paper, we propose a high accuracy camera calibration method for sport videos. First, we generate a homography database by uniformly sampling camera parameters. This database includes more than 91 thousand different homography matrices. Then, we use the conditional generative adversarial network (cGAN) to achieve semantic segmentation splitting the broadcast frames into four classes. In a subsequent processing step, we build an effective feature extraction network to extract the feature of semantic segmented images. After that, we search for the feature in the database to find the best matching homography. Finally, we refine the homography by image alignment. In a comprehensive evaluation using the 2014 World Cup dataset, our method outperforms other state-of-the-art techniques.
This paper presents a learning-based complexity reduction scheme for Versatile Video Coding (VVC) intra-frame prediction. VVC introduces several novel coding tools to improve the coding efficiency of the intra-frame p...
详细信息
ISBN:
(纸本)9781728173221
This paper presents a learning-based complexity reduction scheme for Versatile Video Coding (VVC) intra-frame prediction. VVC introduces several novel coding tools to improve the coding efficiency of the intra-frame prediction at the cost of a high computational effort. Thus, we developed an efficient complexity reduction scheme composed of three solutions based on machine learning and statistical analysis to reduce the number of intra prediction modes evaluated in the costly Rate-Distortion Optimization (RDO) process. Experimental results demonstrated that the proposed solution provides 18.32% encoding timesaving with a negligible impact on the coding efficiency.
暂无评论